Algorithm for informational contribution of web documents

How much information does any single document add to the web? To measure it,

1) Crawl and store all the pages on the web.

2) Compress the entire web and see how much space you need to store it, compressed.

3) For each page on the web, compress the entire web minus that page. Compare total storage to #2. Retain that ratio for each page….

Now, just when we’re getting rolling, people start whinging in that pedantic and annoying way about about feasibility and efficiency and running time…. Sigh. Haters! 🙂

Advertisements

Like this:

LikeLoading...

Related

This entry was posted on Sunday, April 30th, 2006 at 3:50 pm and is filed under Engineering mgmt. You can follow any responses to this entry through the RSS 2.0 feed.
Responses are currently closed, but you can trackback from your own site.

6 Responses to Algorithm for informational contribution of web documents

Curses! This is why I’m not in PR! How could I have forgotten about rule #11 (from page 2 of our “Guide to PR at Yahoo!”): Never talk about killing puppies!

So, is object-oriented-ness valued more? I can get more “information” in my page by linking to others to do the heavy lifting… or does information have to be new?

Would this give every copied document a zero rating? That is every copy including the original would be rated zero, even though the original might be significant.

Is this like weighing yourself without a puppy, and then weighing yourself WITH a puppy — as a method of figuring out how much the puppy weighs?

Yes, exactly like weighing a puppy, except that if you’re a search engine you might be using the test to decide whether to drown the puppy.

mmm. I think I just found journalistic proof that Yahoo plan to squish puppies as a way of figuring out how many pages Google have in their index….

Curses! This is why I’m not in PR! How could I have forgotten about rule #11 (from page 2 of our “Guide to PR at Yahoo!”): Never talk about killing puppies!

So, is object-oriented-ness valued more? I can get more “information” in my page by linking to others to do the heavy lifting… or does information have to be new?

Would this give every copied document a zero rating? That is every copy including the original would be rated zero, even though the original might be significant.