Algorithm for informational contribution of web documents

How much information does any single document add to the web? To measure it,

1) Crawl and store all the pages on the web.

2) Compress the entire web and see how much space you need to store it, compressed.

3) For each page on the web, compress the entire web minus that page. Compare total storage to #2. Retain that ratio for each page….

Now, just when we’re getting rolling, people start whinging in that pedantic and annoying way about about feasibility and efficiency and running time…. Sigh. Haters! 🙂


6 Responses to Algorithm for informational contribution of web documents

  1. Troutgirl says:

    Is this like weighing yourself without a puppy, and then weighing yourself WITH a puppy — as a method of figuring out how much the puppy weighs?

  2. Tim says:

    Yes, exactly like weighing a puppy, except that if you’re a search engine you might be using the test to decide whether to drown the puppy.

  3. Gurtie says:

    mmm. I think I just found journalistic proof that Yahoo plan to squish puppies as a way of figuring out how many pages Google have in their index….

  4. Tim says:

    Curses! This is why I’m not in PR! How could I have forgotten about rule #11 (from page 2 of our “Guide to PR at Yahoo!”): Never talk about killing puppies!

  5. lisa says:

    So, is object-oriented-ness valued more? I can get more “information” in my page by linking to others to do the heavy lifting… or does information have to be new?

  6. Paul Sinnett says:

    Would this give every copied document a zero rating? That is every copy including the original would be rated zero, even though the original might be significant.

%d bloggers like this: