Most of what I have to say about the size wars I said a year ago. And otherwise I think that Jeremy nailed it pretty well. (Interestingly, although my post from a year ago was called “Size doesn’t matter”, and Jeremy’s is called “Of course size matters!”, there isn’t substantial disagreement between them.)
I was amused by the stiff reaction from Google, including this response given to John Battelle “Our scientists are not seeing the increase claimed in the Yahoo! index. The data we have doesn’t support the 19.2 (billion page) claim and we’re confused by that.” Well if not just _scientists_ but _Google scientists_ are confused, then it must be wrong, right? 🙂 Actually, index size claims are inherently difficult to make transparent, even to scientists, since neither engine supports issuing a query and getting 10 billion URLs displayed in your browser window, but this is nothing new. I’ve never seen 8 billion URLs from Google (except for the count on the front page). This does not actually confuse me. (It’s especially ironic that this complaint about transparency was issued by a Google spokesperson who preferred to remain anonymous.)
[Update: of course the prev paragraph was a bit flip: there are ways to try to estimate index size based only the results you can actually inspect from queries. But in general it’s a very hard problem, even if you believe you know how the search engine in question works (see this nice summary of the issues by SearchEngineWatch). Any query returns a very small subset of the results, and for any particular query it is close to unknowable whether a given URL is not showing up because it is: a) not present in the crawl/index, b) was filtered for quality reasons, or c) is present, but is simply not ranking high enough for you to see it. The NCSA study fell victim to the classic “let’s search near the lamppost, where it’s brighter” style of mistake. Sure, there are better ways to try to do this kind of estimation (although they all have their methodological issues), and I suspect that G wanted to talk about one of their own. But it’s a hard balancing act to demand transparency from a competitor while revealing no information yourself, especially since they themselves are apparently not willing to publicly disclose _even_ the testing method that is making them confused.]
I am also amused by the reaction from the pundits who were especially irritated by the latest volley, which put us (Y!) ahead in size. Even if you agree that there are more important dimensions of search engine quality than size (as I do), I think it’s unrealistic to expect a competitor to entirely withdraw from any competitive dimension that customers care about (even if you might also like to educate customers on the nuances). Why shouldn’t Y! strive to win on all fronts? Does Y! have some special obligation here to draw back, that the G entity is not subject to?
But I agree — let’s stop the arms race, let’s stop the madness. Let’s stop it now. (Yes, _right_ _now_ … while we’re ahead 🙂 ).