Whew. In December I blogged that I was leaving Yahoo! for a new job, and had every intention of following up with news about the new job and the new company just as soon as I started. But the next time I look up I find that it’s … June(!). I guess new jobs have a way of doing that to you.
So what is the new company? Powerset, which is applying natural-language technology (originally developed at [formerly Xerox] PARC) to web search. And what does that mean? It means that, rather than indexing the words on webpages, the first thing we do is parse the sentences into the kinds of syntax trees that you might see in a grammar or linguistics class, complete with noun phrases, verb phrases, prepositional phrases and so on. And that’s just the first thing – once we’ve figured out the syntactic structure, we’re extracting every bit of semantic meaning we can to match against any query you might want to enter. And if we can do all this correctly, we should be able to separate semantic wheat from chaff ever so much more efficiently than regular search engines, and find things for you in ways that have never been possible before.
If all this sounds computationally expensive to try to apply to the entire web, well … it is. There’s a big double bet here, on a couple of decades-long trends: that NLP technology keeps getting ever faster and more mature, and that Moore’s Law continues making computation ever cheaper, and that the historic moment when NLP meets Moore’s Law in the middle at web-scale search is …. 2007. Or maybe 2008. ;)
So what am I doing? Nothing to do with webspam, right now – we aspire to be a spam target, of course but that time has not yet come. I’m directing an engineering group that does several things, including the metrics and relevance-testing program that tells us how we’re doing beyond the cool demo examples. Among other things, this means that if anyone is in a mood to start shooting the messengers, then I might be the first casualty. But luckily it seems that even at the highest levels people understand that the only way you can possibly figure out how you’re doing (and how to improve) is to be ruthlessly blind and cruelly random in testing and sampling. So far so good. I haven’t even been wearing the Kevlar to work lately.
Now, Powerset has been all over the press, including multiple New York Times articles. The founders are not shy people, and they’re happy to explain to as many reporters as possible exactly why Powerset must remain in stealth mode. :) And there has been a lot of blog chatter (and even incisive blog discussion) about Powerset’s ambitions and prospects. I’ll cover some of the controversy in a later post. For now, though, let me say that I’m amused by seeing the following simultaneous critiques of Powerset: 1) what Powerset is trying to do is so hard that it couldn’t be done in a million years, and 2) Powerset is lame because it hasn’t launched already. Uh…. either one of these could be true, but surely they’re not both true at once? Pick at most one, please.