Silicon Valley and “Science”

August 18, 2013

I’ve been working in Silicon Valley since 2000, at first as an engineer and then as a …. well, that’s what we’re here to talk about.

Over the last decade I’ve seen increasing use of the title “Scientist” for a certain kind of employee who works on large consumer Internet services (like Google, Facebook, Twitter, Yahoo, Amazon, etc.). The profile of these people is that they typically (but not always) have a Ph.D. in Computer Science or a related discipline (Statistics, Physics, Math), they are mathematically facile, they are well-versed in machine learning and other statistical techniques, and they are comfortable with the analysis of really large data sets. (I’m talking here about the “applied scientists” who work on product, not the small number of people in research labs at these companies who really are judged by a “publish or perish” standard).

They are also distinguished from “engineers” by what they don’t (usually) do. They may or may not write production-ready code that is expected to be polished, and bug-free, and performant, and ready to run on many machines at once without causing problems. The output of “scientists” can be anything from proposed algorithms (to be implemented for real by “engineers”), to prototypes, to analyses of data, to metrics code, and (especially) to carefully-crafted machine-learned models that are applied at run-time by production code (and which engineers are not trained or qualified to build).

The “Scientist” title has always bothered me a little, both because it feels like an inaccurate use of a perfectly fine word, and because it feels like grade inflation. 

It feels inaccurate because I think of scientists as people who figure out new facts and principles about how the the natural world works, and I think of engineers as people who apply that scientific learning to create useful artifacts. By those definitions, anyone working on, say, making Google Search better is clearly an engineer, not a scientist. As my friend Sameer put it: “They’re not looking for no Higgs Boson”.

It feels like grade inflation because I can see how it happened: you had half a generation of very smart people graduating with Ph.D’s, many of whom had always planned to be scientists, and then found themselves in a world with a few low-paid postdoc positions and great job openings in a booming Internet sector. The only hitch?  “But I’m a scientist!”. Hiring manager: “No problem, we need scientists too!” (editing the job title as he speaks).

(The somewhat more recent term “Data Scientist” bothers me a little less – more specific, more modified – but it’s only very recently indeed that you see that as a literal job title. Previously you might have someone informally described as a “data scientist” but the literal title would be “Scientist” still.)

If I had my druthers I would call all of us Internet tech types “engineers” and then divide further into “software engineers” and “data engineers”.  (This is where the “real engineers” (civil, mechanical, aeronautical) pop up and sniff “Software engineers aren’t even engineers! They don’t even have to be certified!”  OK, OK – I’m sure there was grade inflation in that shift too, but I’m not willing to go with “code monkey” and “data monkey”.)

Now, have I convinced you that a (Data) Scientist isn’t really a scientist?  Good, because now I am going to change my mind and argue in the other direction.

The reason is that most data scientists employ a very particular kind of Method in their work. Can you guess what kind of Method? (Sure you can.)

A lot of data scientists end up improving product by going through the following cycle (and here I am not giving away any trade secrets particular to my company – I have seen this done in enough different bigcorps that I’m sure it’s ubiquitous):

1. Develop a theory about what kind of change might improve the service.

2. Test the theory as best you can with offline data.

3. If offline tests look good, implement a version of the service that implements the change

4. Expose the new version to a slice (or bucket, or shard) of the service’s users. This is usually done by mapping user IDs to that slice persistently, so that a given user will see the new version with every request, over the space of a week or a month.

5. Keep a control slice with a disjoint set of users that see the old version of the service.

6. Now, compare the two slices using whatever metrics you have defined for “improvement” of the service. It might be number of clicks, or time on the site, or transactions performed, or money spent – it depends on the particular application and business.

7. Ask: is there improvement?  And is the improvement statistically significant? If so, take it that the “theory” in step #1 is provisionally validated.

Now let’s call this cycle by its right name: this is a “double-blind controlled experiment”, not that dissimilar from the experiments that pharmaceutical companies use to study new medicines. (Do people who conduct such drug studies call themselves scientists? I’m guessing that they do.)

It may not be obvious at first that it is blind in both directions – after all, the scientist could in principle know all the affected user IDs – but in practice nothing can be done to treat the control population differently from the experimental population aside from the change they are trying to study.

So are these folks engineers or scientists? They mostly have Ph.D.’s, and they use the scientific method (double-blind controlled study) in their work. But what they are studying are artifacts (not the natural world) and they are all about making the artifacts better, not about enhancing the store of human knowledge. To improve the artifacts they use all the fruits of scientific research, one of which is … the Method itself.

So I say “Engineer”, but if you like the S-word better then that’s OK with me – you’ll get the same number of free lunches either way, right?


Venga venga venga!! (Management and the Tour de France) [repost from 2005]

February 22, 2013

An amusing sidelight of the Tour de France is the one coach who seems to spend the entire race in a car right next to his prize rider, with the window open, yelling this over and over again: “Venga venga venga!!! Venga venga venga venga!!!” I think it’s a coach of one of the Basque teams, and the words are my transliteration, but I remember one of the commentators saying that this meant something like “Go go go go!! Faster faster faster!!”

In reality this coach is probably really competent, and no doubt he does all kinds of complicated work around strategy and training during the rest of the year. But it’s more amusing to think of this as being _all_ he does. And if so, what kind of value does he believe he adds? As he comes home weary and hoarse at the end of a long day, is he secretly congratulating himself for the very fact that his rider crossed the finish line? Well, if the rider in question is in fact very lazy, and would pull off the road and go doze in a field given half a chance, then maybe the coach is helping. But this is a world-class athlete we’re talking about here — it seems kind of unlikely that someone has to be shouting at him every moment for him to want to ride fast.

You see where I’m going with this I’m sure — but I do actually think that if there’s one way that managers (including both eng managers (like me) and product managers) can persuade themselves that they’re adding value when they’re not, it’s by spending their day shouting “Venga!” Because if the people they talking to are already motivated and working hard, then they may be accomplishing exactly nothing.Now, of course, I do think that most of us do add value — but it’s usually through other things like letting people know about things they didn’t already know, and making priority decisions, and whatnot. But a surprising number of utterances can be usefully translated into venga-speak. For instance, try these:

“Remember, we have to get that X thing done by Monday, so that Y thing can start”
Translation: “Venga, venga venga venga venga venga, venga venga venga!!” (unless there’s a chance that the hearer didn’t know that Monday was the day, or that Y depended on X).

On the other hand, saying something like:
“Remember, we have to get X done by Monday, so Y can start. That’s more important than Z.”
cannot be translated into pure venga-language, even with an arbitrary number of vengas, because it conveys some non-venga information.

In fact, _any_ statement that’s purely about stressing that some things are important, without giving any clue about what might not be important, is, um, monotonically vengizable (that’s a technical term). Even something like this:

“People are complaining a lot about X. We’ve got to get that taken care of. And Y, Y too. Y is a big problem”.
Translation: “Venga venga venga venga venga!!” (unless the hearer can reasonably be expected to realize that this means that A, B, C , and Z can wait by comparison, and in fact that they have been authorized to wait).

The problem here is not just lack of information conveyed — it’s the damage to the speaker’s own notions of causality. People have a deep-seated need to feel that they are affecting the world, and also like to feel like they can affect the world with their words. You say “C must happen!”, and some number of days later, C happens. Most people, even those with a scientific bent, will react to this pure correlation with a little bit of self-satisfaction (and an increased feeling of efficacy), without thinking hard about the possibility that C might have happened on exactly the same day even if they had never been born.

This superstitious feeling of contribution can be a problem even when you _are_ getting some things to happen faster. If you press for the things you want, then the fact that pressure probably pushed some other things out of the way fades from view. I think that this is particularly a problem for “evangelists”, who for one thing have been trained to think that they are by definition on the side of the angels, and who have also been encouraged to think that opposition is oppositional because it’s hidebound or stupid. “Nothing was happening over there until I lit a fire, etc.” is a typical thing for an evangelist with this disease to say. (I’ll translate that sentence just as soon as someone tells me what the past-tense form of “venga!” is).

Privileging classical music

February 22, 2013

[This post started its life as a comment on an answer to a question on Quora. Here’s the original answer, but those of you who are not Quora users may have no joy. So here is some context:

The original question: Why is it more difficult to memorize or sing a classical piece than a pop song? 

Excerpts from the answer: “1) Classical music is much more musically complex. [..] Pop music, because it has to be catchy (and, let’s face it, is composed and performed by people who largely are minimally trained as musicians), is extremely simple musically by comparison. [..] 3) Classical music (opera as well as instrumental) requires a much higher level of musical competency than pop music. Any tone deaf joker can sing “Single Ladies” by Beyonce with miserable technique and sound reasonable at the local karaoke bar. [..] There is no faking it if you are a classical musician. You have to know what you are doing. Classical music truly separates the men from the boys.”]

My comment: Wow, I really dislike this style of bare assertion-of-privilege of classical music over other forms.

For one thing, as with many defenses of “high-church” art against popular art, it starts off with willingness to compare the best of classical music against the cherry-picked worst (or at best the average) of pop music. This is a rigged game. Do not compare Bach to Beyoncé – compare him to the most artful, obsessed, and talented popular musicians of the last century. I won’t say exactly who should represent the pop world here, since it depends very much on your own taste in subgenres – but it could be Fats Waller, Gershwin, Billie Holiday, Miles Davis, the Beatles, Lou Reed, Talking Heads, Brian Eno, Radiohead, Elvis Costello, Rufus Wainwright, just to take some examples that resonate with me. (Your examples may differ.)

Secondly, let’s deconstruct this: “Classical music is much more musically complex”. Well, actually classical music is, unsurprisingly, complex in the dimensions that are valued in classical music, and mind-numbingly simple in the dimensions that are not valued in classical music. If you are oriented towards classical music, of course, then you pay attention to the dimensions that you have been schooled to value, and so it straightforwardly seems to be more complex.

Classical music is more complex than competing forms in at least two ways: long-term harmonic structure, and medium-term melodic structure. Yes, the melodies in classical music can be complex, with long phrases.

Now let’s talk about the ways in which classical music is simple (and by “classical music” I mean what most people mean: the “common practice period” of works composed in Europe between 1600 and 1900):

  • Harmony. (dissonance vs. consonance): Classical music is, to most modern ears, mind-numbingly consonant. Almost everything is written in either the minor or major mode; most of the intervals are thirds or fourths or fifths. Every so often you might get a seventh chord (lawdy). If you are used to more complex, dissonant and risky harmonies (say, from jazz or rock traditions) or more diverse tones, tunings, and modes (say, from non-Western musics), then the harmonic vocabulary of the common-practice period feels complex like tapioca pudding is complex.
  • Rhythm. Most classical music maintains the same time signature throughout the entire piece, and that signature is one of: 4/4, 3/4, or something you get by dividing or multiplying numerator or denominator by 2. (Can we all take a moment to yawn?) There is almost no polyrhythmic action (of multiple signatures played against each other), and no groove aesthetic. Anyone who has had much exposure to either African music or its descendant genres in the West can’t help but experience classical rhythms as earnest and plodding.
  • Timbre. This is the one really unfair shot that I am going to give out to classical music, because they were doing the best they could with the technology of the time. Orchestral music of the common practice period is an explosion of timbres, with their horns, reeds, strings, percussion and keyboard instruments. But they lacked four things that we are all used to today in pop music: 1) electric instruments and amplification (including the rich sonic possibilities offered by feedback), 2) electronic synthesis, 3) the techniques of the studio (multi-tracking, overdubbing, close miking of the human voice) and 4) sampling. The result is that we can now do almost anything we can imagine in the timbral domain, and most likely we are still in the early days of exploration.

The above is a sketch of how classical music appears to ears that have been trained in other traditions: it seems complicated in a couple of dimensions, and very boring in the rest. To classically-trained ears, pop music sounds boring in some dimensions (particularly melody and long-term harmonic structure), and “noisy” or “chaotic” in the others – those being the dimensions in which pop music is too complex for the classically-trained ear to pick up on.

Who needs algebra? Mathematicians!

August 3, 2012

Who Needs Algebra? (NY Times, possibly behind paywall)

I have a feeling of deja vu here, since 5 or 10 years I wrote a blog-response to an editorial by Roger Schank that called (roughly) for the elimination of certain math classes from high schools. Schank’s argument was roughly the same as this one – you don’t need math for most jobs, or most life situations, so whoneedzit?

I’ll tell you who needs mathematics: mathematicians. (I am not kidding.) Also, of course, people in closely allied fields: physicists, engineers, actuaries, statisticians etc.  But for rhetorical effect, let’s stick with mathematics.

If someone is going to become an algebraist, they will first have to “struggle with algebra” (in the sense of solving-for-x and factoring polynomials), and then they’ll have to “struggle with algebra” (in the sense of re-proving certain well-known elementary theorems about groups and fields) and then if they are lucky they will qualify to “struggle with algebra” (in the sense of helping humankind to figure out under which conditions the free universal algebra of an uncountable signature over a metrizable space is paracompact (yeah, I looked that one up) – in other words, do original research in mathematics).

The main point here is that mathematical disciplines are a strenuous and cumulative track and, to a first approximation, once you’re off the track you’re off the track for good. (I can tell you when I got off the track: later than many but still quite early, at age 22, after a particularly cruel course in point-set topology.)

I suspect that there are no Olympic medal-winners in gymnastics who did no gymnastics between ages 10 and 16.  Similarly, I bet that there’s no one on the mathematics faculty of a good university (let alone Nobel and Fields medal winners) who did no mathematics between ages 10 and 16. If as a country we decided that there would be no gymnastic classes, starting right now, I think we would give up our Olympic dreams in gymnastics in 2020. We should expect a similar result in math and science if we eliminate “onerous” math classes country-wide.

Now the anti-math-class argument potentially divides into two alternatives:

1) No one needs math (not even mathematicians, physicists, statisticians, because we don’t need them).

2) Most people don’t need math, because, y’know, most people aren’t going to be mathematicians, scientists, or engineers.

I won’t even bother to reply to #1. As for #2 – that is true, but can you pre-identify those people before they start “struggling with algebra”? And do we want to eliminate these courses from all high schools?  Including, say, Groton, Exeter, and the Bronx High School of Science? Or just most high schools?

My fear here is that there is a lazy, unexamined, nasty assumption that we can pretty much assume that certain neighborhoods, groups, cities, and school systems are unlikely to produce the Nobel winners of the future, and that we might as well cut our losses now. Even if this were palatable from the point of view of social equity, I think that the only way to make this kind of prediction confidently is to make it self-fulfilling.

Mathematical/scientific talent is an odd plant – it can blossom in places that never could have been predicted, but drought can kill it early. Let’s not turn the water off just yet.

Religious litmus tests

June 27, 2012

“[N]o religious Test shall ever be required as a Qualification to any Office or public Trust under the United States.” – Article 6 of the U.S. Constitution.

This constitutional restriction isn’t about voting, of course – instead, it’s about who is qualified for office in the first place. Presumably the framers realized that Baptists might refuse to vote for Methodists and vice versa, and that there was not a lot to be done about that, really.

Still, as a voter in a pluralistic society, I would like to feel that I’m modeling the underlying intent of Article 6. I would like to live in a country where anyone could have a shot at the highest office: Baptists, Methodists, fundamentalist Christians, Jews, Muslim and (least likely of all, if you believe some recent polls) atheists like me. It is interesting for me to realize that I cannot entirely sign up for that.

I am an atheist with a worldview much like the famously pugnacious ones: Dawkins, Dennett, and the late Christopher Hitchens. I’m ambivalent, though, about their evangelistic approach, mainly because I think that religiously pluralistic societies work just a bit more smoothly if folks don’t go around picking fights about religion. Of course I feel that these guys should be able to write whatever they want, and that a religiously pluralistic society (or world) will inevitably have a lot of conversation that some fraction of people find to be blasphemous in the extreme – including, say, Danish cartoons mocking well-known prophets. So there will be arguments, and jokes, and eff you if you can’t take a joke, and no, others’ blasphemous jokes don’t justify your violence. Other things equal, though, I’d rather not even go there, most of the time.

Except … in the end, you can’t keep religious beliefs entirely separate from everything else, no matter how much you try.  So, although I don’t want to impose my own litmus test, here are two questions that I was hoping that someone would ask in the 2012 GOP presidential debates, back when the field included Bachmann, Perry, and Huntsman:

Question 1: “Candidates – do you believe that the following is literally true?  A few thousand years ago, a guy built a boat, and put breeding pairs of land animals on that boat. All land animals in the world today are descendants of the animals that were on that boat.”

Question 2:  “Really?”

If this question amounts to picking a fight, then I think it is a necessary fight in choosing a possible President. Because if you take a “Yes” answer *seriously*, then it has all sorts of implications, some obvious, some seemingly trivial.  Here are a couple of them:

  • Sea levels have risen before, and everything worked out that time.
  • The only life forms that matter are the big ones that we can easily see
  • Species extinction doesn’t matter as long as you have a single breeding pair.

This stuff actually matters: when a president makes certain decisions, he/she will have to either have reasonable beliefs about population biology (and climate change, and fossil fuels (note the “fossil”)) or will have to trust someone who does.  (And yes, I’ve switched now to using the word “reasonable belief” to refer to things I believe.  I think that this is the crucial point at which you can see my religious tolerance dissolving and I am revealed as a litmus-tester.  Unfortunately, I don’t see any way around it.)

If we’d been able to ask that question of the GOP presidential field in 2012, I think that the answerers would have divided into four camps (with the last three being hard to tell apart from the outside):

  1. No. (I.E. no I don’t believe in the literal truth of the Noah’s Ark story.)  As far as I know, only Huntsman would have answered this way.
  2. Yes (literally). This candidate simply believes the story, and would also believe direct implications of the story.
  3. Yes (cynically).  This candidate doesn’t believe in Noah’s Ark one bit, but wants to carry the state of Kansas.
  4. Yes (in a compartmentalized way).

The last of these is the most complicated and interesting. People are really good at believing seemingly inconsistent things, with different parts of their mind, or in different contexts and social settings. So maybe in the context of a Sunday morning sermon, our candidate believes in Noah’s Ark 100%; in the context of talking to his Science Advisor about the environmental impacts of an oil spill, he has shifted gears and Noah’s Ark isn’t as much on his mind. I am sympathetic to this kind of compartmentalization – I know that I do it a lot, and I know also that I probably do it a lot more than I think I do.

All of this leaves a quandary for atheistic voters like me, however. Imagine that we submit to the inevitability of a failure of our own litmus test, and we conclude that the next President of the United States will be someone who says “Yes” to the question “Do you believe in the literal truth of the Noah’s Ark story”?

The question then is:  which kind of “Yes” man do you want in office?  A true-believer, a cynic, or a compartmentalizer?  I am not sure which I would choose first, but I think the true believer would be the worst of the three.

Elvis Costello at the Warfield (4/15/12)

April 23, 2012

For some reason I went to this show with a dual consciousness: as long-time rabid fan, but  also trying to channel the experience of a newbie. As I’ll explain, the first consciousness enjoyed itself;  the second, not so much.

By rabid fan, I mean the usual embarrassing kind of thing – me, with most of the huge catalog of songs sunk into my brain, humming many of them at idle moments, introducing EC into conversation completely inappropriately the way geeks of a certain age begin to recite Monty Python skits. A non-fan friend once asked me to compile a best-of to see if she could like him to (or at least being asked is the way I remember it). I came back with 3 discs, 50 songs, and a crazy glint in my eye. Just the absolute essentials, you see.  (In my defense, this is probably just 10% or so of his output.)

But getting to like an Elvis Costello song or album has always been a slow, odd  process, even for a fan like me. For a long time whenever a new album would come out I would immediately get it and then pronounce it a disappointment. It would seem bland, overly complicated, featureless, whiny, not up to his previous standard. Only at about the tenth hearing would some of the complicated melodies begin to latch on to parts of my brain like lampreys, and things would only get worse from there.  EC’s songwriting is intensely melodic, but almost never catchy, in the sense of likability on first hearing (other than about five hits that charted). And if a crazy EC fan doesn’t like his songs on first hearing, why should anyone else?

Then there’s the voice. I think it’s great – both the grainy expressiveness, and the funny quality where the start of every vocal note somewhat unpredictably “catches” (or doesn’t) like an automobile engine that might or might not want to turn over.  For some people, though, it’s just husky and rough and “bad”.

But on to the concert itself: my friend Jeff and I show up, get drinks, survey the crowd.  Everyone is middle-aged, of course, and I am amused to see the remaining indicators of the 80’s-era hipsters they once were:  the vestigial earrings (radical in their day), the porkpie hats. But as we sit down, Jeff accidentally brushes the legs of the guy behind us with his coat, which throws the guy into a rage.  It’s hard to correlate his mid-50s middle-class bespectacled appearance with his rant:  “Sorry guy, now you fucked up. Just fucking turn around and stop talking, now”.  As he stomps off triumphantly to find Security to have us thrown out for the offense (having narrowly decided not to just beat us to death), Jeff tries to reason with the wife, who hews to the capital-punishment party line. Security is unmoved, however, and we are allowed to stay.  Can I recommend a listen to “I’m Not Angry”? (Not to be a city-snob, but sometimes I wonder if some folks who come from the suburbs to concerts in the city simply cannot deal with the sudden increase in population density, and freak out for that reason alone.)

Finally, a welcome distraction: the show. For this tour, Elvis has revived the spinning wheel (along with of course the dancer’s cage and the Society Lounge)!  On stage there’s a huge Wheel-of-Fortune style wheel with 50 or so songs on it that selected audience members are invited to spin – whatever comes up is what the band will play right now.  It’s both a display of performance bravado (“we’ve got at least 50 songs that we’ll play at the drop of a hat”) and a fun randomizer. Beyond Belief got picked (making Jeff happy) and when it landed on Episode of Blonde (that weird meld of spoken-rant verse and intensely-sweet hook-chorus), and then he came up to our balcony to sing it, well yeah, I got my fan’s money’s worth for sure.

As I said at the top, though, I couldn’t just enjoy myself as a fan, and kept slipping into seeing the show as I thought a non-fan might see it, and saw any number of barriers:

  • Complexity, non-catchiness – if you’re not going to be immediately captivated on hearing a recording, will a live show be different?
  • The voice – if it’s rough when captured exactly as desired by recording engineers, how will it seem in concert? And Elvis is sometimes an exquisitely precise singer, but under pressure of live performance, well, yes – he will rush up to the mike a little bit late, and maybe kinda bellow into it, and what are new arrivals to make of that?
  • Finally, it has to be said: acoustics of live shows mostly just suck – boomy, muddy, indistinct. I wondered at this show whether touring shows like this should redistribute effort a little bit – take 3% of the budget currently spent on talent, stage props, logistics and devote it to someone sitting at random points in the venues figuring out whether the lead singer can in fact be heard with clarity.

I am convinced that there is a funny dynamic with fans of studio-intensive music played in concert, that shows up at least as much with hip-hop acts as with aging post-punk singer-songwriters: you hear a faint allusion to a studio-produced track you love, and you are all woo-hoo and oh-yeah, because you love the studio track and here you are hearing it *live* so that must be even better and so you lungfully represent your love for that studio track, and are very happy. But (especially with acts that really exploit the studio in a good way) how often is this just a reminder of the track you love and not an improvement on it?  Would a new listener have anything like that reaction?

So if you have never heard EC before, can I recommend his live show?  Nope – it’s likely to be about a first hearing of a really complicated song (that you might like on the tenth hearing) that is more comprehensible on record than in concert, and that is obscured by vocal oddities and acoustic problems and will be just hard to figure out, leaving as possible appeals just the sonic fun of the concert and the sort of partially-likable ironic-vaudeville showmanship that EC enjoys.

If you’re a fan, though, I recommend this incarnation of the EC tour really strongly – spin that wheel and you’re likely to  be happy, especially if you already know the song that comes up on top.

Stealth site for Jybe

June 11, 2011

A stealth-mode startup should have a stealth-mode site, and we have one now for Jybe. Target audiences are 1) potential users and alpha-testers, 2) potential employees, and 3) potential funders (although we are not looking for outside funding quite yet).

I love the logo and visual design from the guys at ZENxd.