Previous Entry Share Next Entry
01:08 pm, 14 Aug 06


I haven't posted much about the papers I've been reading recently. This is in part because my commute to work changed to be less conducive to reading: in SF, I spent an hour on a bus, while in Tokyo it's 20-30 minutes and spread across different reading-unfriendly places. And it's also because I've been gradually becoming more comfortable with reading about the sort of thing I actually work on, and I'm reluctant to post about the IR papers I've been reading.

Anyway, I read that they've updated the list of papers published by people at Google. This may have been announced before, but I didn't see it.

There's a great variety of subjects, from Googley sorts of problems ("A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets" -- I've read this one) to algorithms ("An O(log n) Approximation Ratio for the Asymmetric Traveling Salesman Path Problem") to biology ("Oral Mucosal Microvascular Network Abnormalities in De Novo Mutation Achondroplasia") to systems (MapReduce, etc.) to, um, "other" ("Head Normal Form Bisimulation for Pairs and the Lambda Mu-Calculus").

Here's some snippets from "The Price of Performance", which discusses TCO from Google's perspective. "Total cost of ownership" to me is a commercial-vendor (Microsoft, Apple) charged phrase used to attempt to argue against free software. But here it's used in its literal sense:
Often the major component of TCO for commercial deployments is software... per-CPU costs of just operating systems and database engines can range from $4,000 to $20,000. Google's choice to produce its own software infrastructure in-house and to work with the open source community changes that cost distribution by greatly reducing software costs (software deployment costs still exist, but are amortized over large CPU deployments).
So what does matter? You've seen it before, but it still is scary:
A typical low-end x86-based server today can cost about $3,000 and consume an average of 200 watts ... power delivery inefficiencies and cooling overheads will easily double that energy budget. If we assume a base energy cost of nine cents per kilowatt hour and a four-year server lifecycle, the energy costs of that system today would already be more than 40 percent of the hardware costs.

And it gets worse. If performance per watt is to remain constant over the next few years, power costs could easily overtake hardware costs, possibly by a large margin. [...figure demonstrating this...] For the most aggressive scenario (50 percent annual growth rates), power costs by the end of the decade would dwarf server prices (note that this doesn’t account for the likely increases in energy costs over the next few years). In this extreme situation, in which keeping machines powered up costs significantly more than the machines themselves, one could envision bizarre business models in which the power company will provide you with free hardware if you sign a long-term power contract.
The rest of the article is quite interesting -- Luis, the author, was apparently previously a chip design researcher at DEC -- claiming, for example, multicore processors could do less speculative execution by having more threads with nonspeculative instructions to execute.

Whenever I read about these computing clusters I always think of Trantor, which as I recall had to push its excess heat out to space. (But maybe that's just the 80° weather here speaking...)