Archive Page 3

Rocketbelts

Here’s a cool article on rocketbelts in Slate. Clearly the next big thing after we all get our flying cars.

The big problem (and, no doubt, the main barrier to serious consumer adoption) seems to be maintaining stability. (Check the quote about a bad flight: “I felt like I was a balloon someone blew up and let go.”) Staying pointed in the right direction (which seems usually to be: upright) requires fine-grained manual control of the thrusting rockets.

Now, doesn’t this balance-and-orientation thing seem like something that should really be controlled by the machine, a la Segway, rather than by hand? Dean Kamen, are you listening? Mailmen, foot-patrol cops, and warehouse restockers are limited to a miserable top speed of 12 miles/hr right now, not to mention the embarrassing limitation to the 2D surface world. I don’t know about you, but that’s not the future *I* want to be living in.

Hotel wifi – there oughta be a law!

I’m your typical only-slightly-left-of-center liberal, in that while I believe we need govt regulation and consumer-protection laws, I also believe market forces will auto-correct a lot of lame consumer offerings. Except when I personally feel jerked around – then it’s martial law time.

Case in point: hotel wifi. What kind of internet access do you have? we ask brightly. Wireless internet in all the rooms! they say brightly. Then you arrive, and begin to dimly realize that the real situation is some small underpowered wireless router hidden behind the boilers in the sub-basement of your 10-story hotel. And as long as they have one of those, somewhere, that’s powered up some of the time, they can say that, can’t they?

Clearly it’s past time for a Federal Bureau of Wireless Quality Inspection, complete with unsmiling G-man trenchcoat types impersonating laptop-toting guests, and their black-jumpsuited tech experts in wardrive vans monitoring signal strength to the microbar. Just think of the restauranteur’s fear of the Health Inspector (in the U.S.) or the Guide Michelin reviewer (in France) and you’ll see where I think we need to go with this.

The odds of living through tomorrow: 100% (from past experience)

Having already revealed myself to be a grumpus about language, let’s do one on the quantitative side of the aisle so that no one thinks that I’m not broad-minded in my grumpiness.

This one comes to me via BoingBoing. It’s a Wired article purporting to show you that the dangers of terrorism are way overblown, compared to run-of-the-mill death machines like the one parked in your driveway. The centerpiece is this chart (which I reproduce without permission):

S E V E R E
Driving off the road: 254,419
Falling: 146,542
Accidental poisoning: 140,327
H I G H
Dying from work: 59,730
Walking down the street: 52,000.
Accidentally drowning: 38,302
E L E V A T E D
Killed by the flu: 19,415
Dying from a hernia: 16,742
G U A R D E D
Accidental firing of a gun: 8,536
Electrocution: 5,171
L O W
Being shot by law enforcement: 3,949
Terrorism: 3147
Carbon monoxide in products: 1,554

This is based on counts of deaths from various causes in U.S. during the 11-year period 1995-2006. This is fine as far as it goes, and I really _want_ to like the piece, because I too oppose terrorist fear-mongering by the current presidential administration, and crave some sort of sane discussion of the risks of terrorism compared to other risks. But if the whole point is to be more numerate-than-thou, then you don’t want to be as quantitatively-challenged as this article is.

The problem is that the piece keeps referring to the “odds” of death-by-terrorism as though the death counts from 1995-2006 let you assess those odds accurately for 2007, and it winds up with cute summary sentences like “In fact, your appendix is more likely to kill you than al-Qaida is.”

So the Wired guys are closet frequentists. The frequentist approach can be the best way to estimate odds, when you have a good reason to think samples are random, and when you don’t have any other kind of knowledge, or explanatory model. But when you do have some extra knowledge (like, say, the knowledge that there probably do exist people who would like to kill even more people than died in 9/11), it’s silly to ignore that; and when we know that a sample isn’t very large, random, or representative, it’s silly to pretend that it is. (OK, I admit it – I’m a closet Bayesian.)

Some other frequentist (and otherwise knowledge-free) predictions:

o My own chances of dying tomorrow, based on my previous experience: 0.000000%
o Chances of U.S. civilian deaths in 2007 from nuclear reactor malfunction, based on counting such deaths in the U.S.: 0.000000%
o Your own chances of dying from human-transmitted bird flu in the U.S: 0.000000%

All of these estimates are probably, um, on the low side, and we know this because we know things about the underlying causes that aren’t reflected in the numbers (yet).

Even leaving causal theories out of it, the other thing the Wired article leaves out from the counting data is the variance. Just for fun, complete this series by predicting U.S. domestic terrorism deaths for 2003, based on 1999-2002:

o 1999: 0
o 2000: 0
o 2001: 2973
o 2002: 0
0 2003: ???

I think Wired’s answer would be something like 2973 / 4 = 743, plus or minus, um, an unspecified number.

To Wired’s main point: yeah, people are well known to be lousy risk estimators, and routinely underestimate the risks of the familiar. This is partly completely irrational, and partly … I’m not so sure.

Most people know intellectually that heart disease and traffic accidents are huge mass killers, and most people have developed policies (about seat belts, french fries) to try to mitigate that. And, once having settled into a policy, they typically don’t give it a lot more thought. Now, they may still at this point be implicitly underestimating the risk by a few orders of magnitude…. but at least the risk from a frequentist point of view is unlikely to suddenly change on them. The number of people who die from heart disease next year will probably be same as this year, plus or minus 10%. If you decided that your approach was good for 2006, then it’s probably OK for 2007 too.

What really seems to freak people out is when the risk is new, and they know they can’t bound it at _all_ (which, yes, makes them ripe for political manipulation). What’s the _largest_ number of people terrorism (or bird flu, or supervolcanoes) could conceivably kill next year in the U.S.? No one can say. What should you do differently in the presence of this strange new threat? It’s not clear. Don’t you have enough to think about already without this? Yes, including staying in your lane on the freeway. And if having to worry about this new threat bothers folks more than well-known automotive dangers, is that irrational? I dunno.

Anyway, given a choice between Dick Cheney’s all-panic-all-the-time 1% doctrine and the Wired piece’s if-it-hasn’t-happened-yet-then-it-can’t complacency, I’ll take Wired all the way, but only as a counterbalancing kind of stupidity.

On a horse right now. ttyl.

A couple of weeks ago I took a four-day weekend to visit my long-time friend Jeff in Denver. Back in the day, this was a drinking-buddy friendship, with much time spent fruitfully in the taverns and pool halls of Chicago. Lately, though, we spend more time fine-dining (Jeff’s a serious foodie), doing stuff outdoors, and playing stupid games (a constant).

Most of the trip was spent around Breckenridge, a ski resort in its low-key off-season, checking out the sights, doing some cautious high-altitude hiking (with me acclimated to sea level, Jeff acclimated to one mile up), and competing with ferocious intensity at mini-golf and alpine slide races. (The alpine slide is a luge-like affair on a track, which seems to be the latest clever idea about how to monetize a chair-lift in the summer. By the way, it would be very hurtful to point out to me that all these amusements were really designed for kids, so please don’t do it.)

On that Saturday, we figured that we had time for one more outdoorsy activity before heading back to Denver to hang with Jeff’s family (where I enjoyed dancing with the kids and telling Jeff’s wife T all my troubles). Biking, white-water rafting, hot-air ballooning – all considered and rejected. And this is how it came to be that when JP sent me a text message that afternoon, she got back: “On a horse right now. ttyl.” She must have found this puzzling, and must have looked for a metaphorical interpretation, because the literal interpretation would be that I was … on a horse, which was not consistent with any previous experience. And yeah, I found it amusingly yuppie and SiliconValley to bother texting someone on my Treo while trying to ride for the first time. :) Here’s a picture, which Jeff’s wife described as “a little Brokebacky, but cute” :) (I’m the one on the right.)

SEO book review: ABC of SEO (George)

I never know what to make of books organized in this pseudo-glossary style, where the only organization is alphabetical-by-topic-title. It seems like an abdication of the author’s responsibility to impose meaningful structure. It also makes the reviewer’s task slightly artificial – no doubt the encouraged mode of reading is dipping and sampling (perhaps with odd free moments in the smallest room of your house), but of course as a reviewer I felt obligated to read it cover-to-cover. So how will my experience match up with yours?

With that said, The ABC of SEO is surprisingly meaty, and rewards cover-to-cover reading too. It’s really a set of small essays, each of which is organized nicely on its own, and is surprisingly dense with technical info. It becomes increasingly clear that the glossary use-case is a fiction (would you really turn to this book for definitions of terms like “Competition”?), but it supports browsing through the table of contents for the topics you care about. Before the alphabetically-organized portion, George gives a good and balanced overview of the search-engine ecosystem, and the roles that engines, publishers, SEOs, and advertisers play in it.

The book really shines in the longer in-depth technical entries, where George explains details of crawlers, webservers, log files, and so on. Entries I thought were particularly strong or compelling (in alphabetical order, naturally :) : Altavista, Anchor Text, Banning, Black Hat SEO, Content Targeted Advertising, In-Bound Links, Keywords, Misspellings, Robots and Spiders, and (especially) Traffic Analysis.

I like the point that George makes at several points in the text: no contract for services has been signed between SEOs (or webmasters) and the engines. Webmasters have no obligation to abide by rules set up by the search engines; engines have no obligation to include or rank a given site, and are entirely within their rights to exclude sites if they feel it improves user experience. I like this because it clarifies a debate that often gets a little hysterical and/or moralistic in both directions.

The book has a copyright of 2005, which (in this fast-moving domain) dates it slightly – it’s clear from the text, for example, that MSN was just launching its own in-house web search engine at the time of writing (with some entries written before, some after). Also, in general, it has to be said that the book focuses much more on Google than on the other major engines, including a lot of focus on PageRank itself. Although there are entries for both Y! Search and MSN, you’ll find nothing particularly detailed there on aspects of those engines not shared by Google.

George also cautions against some practices that might confuse SE crawlers, without naming specific engines that might be confused. He’s right in general that _some_ engines have had problems with these constructs, especially in the past. But it’s worth clarifying that Yahoo!’s crawler in particular has no problem with the following:

o Framesets. Whether or not framing is good web design, the crawler can snarf up the framed components into one bundle, without problem.

o Dynamic URLs. In general, it’s good to minimize the number of arguments after the ‘?’ in the URL, particularly when the arguments don’t affect the content and cause the same content to have many URLs (as with session IDs). But the crawler does not have any inherent problem with such URLs.

o Invalid HTML. No engine that I know of will discard your pages just because the HTML is badly formed (e.g. has start tags without corresponding end tags).

Overall, The ABC of SEO gets an enthusiastic thumbs-up for detail, accuracy, and sensible advice.

So how bad is my luck going to be _now_?

Due to a fateful domestic-adoption decision several months back, a black cat now crosses my path about 20 times a day (and maybe hundred times a day on the weekend). I guess if it wasn’t for bad luck, I wouldn’t have any luck at all (and if it wasn’t for disappointments I wouldn’t have any appointments…).

SPoFs and SPoCs

In systems design, a Single Point of Failure (SPoF) is a component that brings everything down if it fails – for example, a single master server machine that is your only way to communicate with a cluster of child servers. Having a SPoF in your design is supposed to be a Bad Thing.

In organizational jargon, a Single Point of Contact (SPoC) is a person who represents a group to another group with respect to some project or issue. Having a designated SPoC is supposed to be a Good Thing, as it cuts down on confusion about who to talk to.

But … isn’t a SPoC also a SPoF? Hmm.

Defuse/diffuse, rein/reign

I usually try to suppress my persnickety Mr. Language Person tendencies, but here I go….

“Defuse” and “diffuse” are different words. They’ve got a lot in common, because they each have precise physical/real-world meanings, though they’re both more often used metaphorically.

“Defuse” originally referred to removing the fuse from a bomb, rendering it harmless. Metaphorically, it refers to taking a (metaphorically) explosive situation and rendering it harmless. “Diffuse” originally referred to the way liquids or gases mix and spread out through each other, or the way substances can pass through a separating membrane. Metaphorically, it refers to taking something (possibly dangerously?) concentrated and spreading it out, making it less concentrated.

What’s the right thing to do when the room is full of concentrated and potentially explosive tension? Well, you could defuse the tension (so it won’t explode) or you could diffuse the tension (like a bad smell, as a way to “clear the air”). So yeah – the meanings overlap a bit. And since they do, the meanings of “defuse” and “diffuse” are slowly diffusing across the thin semantic membrane that separates them, and soon there won’t be any difference at all. That won’t stop me from griping though.

Rein/reign: the former has to do with horses, the latter has to do with kings. Watch Slate get it wrong in this sentence: “Astronomers reigned in Pluto because scientists were discovering other heavenly bodies that could qualify as planets, too …”. These astronomers are powerful over our notions of Pluto, but they’re not actually building castles there yet.

Now that I’ve opened myself up, bring on the spelling and grammar flames. :)

Aggregation

It’s always nice when the borderline between valuable content and webspam content is clear-cut – in that case, the goal of the search engine is straightforwardly to keep the spam out of search results. Unfortunately, some quality issues are a continuum, from the best of the web to the worst.

One of these issues is what to do about “aggregators” – sites and pages that live only to display arrangements of links and bits of content drawn from other sources. The range is continuous, from very tuned and sophisticated content-clustering and layout engines to the worst kinds of scraper spam.

For high-quality aggregation, try Google News, or ScienceBlogs (which JP turned me onto recently). Google News shows a clustering of news stories that is famously untouched by human hands. I don’t know for sure that ScienceBlogs is entirely an aggregation, without any new content, but it looks that way.

Search engines usually want to show content that provides unique value to users – collecting up a bunch of content found elsewhere seems to violate that. On the other hand, are these high-quality sites? Absolutely. And can pure compilation or aggregation add value? Well, I-am-not-a-lawyer, but apparently copyright law gives at least some thin and guarded support for compilation copyrights. And if humans can get credit for assemblage, I think we should extend the courtesy to algos too.

So some high-quality aggregation sites do belong in a websearch index. With that said, you might not always want them to come out on top in a search. If a query matches some snippet of a story on ScienceBlogs, then probably the original story itself would be a better match – but let’s rely on differential relevance algorithms to sort that out.

At the other end of the spectrum, check out this portion of a doc I found while doing an ego search on a websearch engine:

If I am interested in all things Converse (and I am, I am!) then I should really be interested in this doc …. but I’m not interested. There’s no discernible cleverness in the grouping, and no detectable relevance sorting being applied to the search results. This doesn’t belong in a websearch index, as it’s hard to imagine any querier being satisfied.

Other variants of this kind of spam technique cross the line between sampling/aggregation and outright content theft. Imagine that you get home one night to find a stranger leaving your house with a sack containing your TV, cell phone, jewelry. You might misunderstand, until we explain that he’s actually an _aggregator_ – he’s just _aggregating_ your belongings. Yeah, that’s it.

As an in-between case ask yourself this: if you’re doing a websearch (on Google, Yahoo!, MSN, …) do you want any of the results to be … search-result pages themselves (from Google, Yahoo!, MSN)? That is, if you search for “snorklewacker” on MSN web search, and you click on result #4, do you want to find yourself looking at a websearch results page for “snorklewacker” on Yahoo! Search, which in turn has (as result #3) the Google search results page for “snorklewacker”? (It’s easy to construct links that create specific searches and embed them in pages that will be crawled, so this could happen if search engines didn’t take steps (even by removing themselves via robots.txt).)

For one thing this seems like a dangerous recursion that threatens to blow the stack of the very Internet itself. (It’s for reasons like this that I generally wear protective goggles when searching at home – safety first.) But mainly, it just doesn’t seem to be getting anywhere. Search engines are aggregators themselves – on-demand aggregators that show you a new page of results based on the terms you typed. What’s the point of fobbing you off on another term-based aggregator?

Blog aggregators, search engines, tag pages – all of these are fine things as starting points, but the potential for tail-chasing is pretty high if they all point to each other. I say the bar for inclusion ought to be pretty high (though as always it’s just MHO, and not to be confused with my employer’s O at all).

SEO book reviews

I’m planning to start a short book review series on this blog, focused on SEO (Search Engine Optimization) books. The idea would be to review the books from a search engine perspective – what makes sense, what seems crazy/dangerous, etc. I’ll also focus on Y!-specific info and advice to the extent I can.

I’m going to start with The ABC of SEO, by David George, not _only_ because it’s short, really. :) If there are any other SEO titles that people have liked and/or would like to see reviewed, please leave a note in the comments.

« Previous PageNext Page »


a