Saturday, November 11, 2006

Research -- Google

There's something deeper to learn about Google from Gmail than the initial reaction to the product features. Ignore for a moment the observations about Google leapfrogging their competitors with more user value and a new feature or two. Or Google diversifying away from search into other applications; they've been doing that for a while. Or the privacy red herring.
No, the story is about seemingly incremental features that are actually massively expensive for others to match, and the platform that Google is building which makes it cheaper and easier for them to develop and run web-scale applications than anyone else.

An overlooked feature that made Google really cool in the beginning was their snippets. These show a few sample sentences from each web page matching the search.


Consider the insane cost to implement this simple feature. Google has to keep a copy of every web page on the Internet on their servers in order to show the piece of the web page where the search terms hit. Everything is served from RAM, only booted from disk. And they have multiple separate search clusters at their co-locations. This means that Google is currently storing multiple copies of the entire web in RAM. That would really require a lot of memory for the purpose.


Google has taken the last 10 years of systems software research out of university labs, and built their own proprietary, production quality system. It's a distributed computing platform that can manage web-scale datasets on 100,000 node server clusters. It includes a petabyte, distributed, fault tolerant filesystem, distributed RPC code, probably network shared memory and process migration. And a datacenter management system which lets a handful of ops engineers effectively run 100,000 servers. Any of these projects could be the sole focus of a startup.


The most obvious concern for Gmail is storage. They can't lose emails and more importantly, would never be expected to be down. The backup system used isn't the conventional one. RAID isn't any good. If a RAID disk fails, it requires human intervention to replace the disk, and in case more disks fail, it can lead to data loss. Moreover, it requires hot swap trays which are an expensive piece of equippment. RAID has a problem with high end availability of data at the server level.


Google has 100,000 servers. If a disk fails, it is left to be reclaimed/repalced later. Hardware failures need to be instantly routed around by software.


Google has built their own distributed, fault-tolerant, petabyte filesystem, the Google Filesystem. This is ideal for the job. Say GFS replicates user email in three places; if a disk or a server dies, GFS can automatically make a new copy from one of the remaining two. Compress the email for a 3:1 storage win, then store user's email in three locations, and their raw storage need is approximately equivalent to the user's mail size.


The Gmail servers wouldn't be top-heavy with lots of disk. They need the CPU for indexing and page view serving anyway. No fancy RAID card or hot-swap trays, just 1-2 disks per 1U server.
It's straightforward to spreadsheet out the economics of the service, taking into account average storage per user, cost of the servers, and monetization per user per year. Google apparently puts the operational cost of storage at $2 per gigabyte. It is assume the yearly monetized value of a webmail user to be in the $1-10 range.


Google has 100,000 servers.


Any sane ops person would rather go with a fancy $5000 server than a bare $500 motherboard plus disks sitting exposed on a tray. But that's a 10X difference to the cost of a CPU cycle. And this frees up the algorithm designers to invent better stuff.


Without cheap CPU cycles, the coders won't even consider algorithms that the Google guys are deploying. They're just too expensive to run.


Google doesn't deploy bare motherboards on exposed trays anymore; they're on at least the fourth iteration of their cheap hardware platform. Google now has an institutional competence building and maintaining servers that cost a lot less than the servers everyone else is using. And they do it with fewer people.


They must have a little internal factory to deploy servers, and the level of automation needed to run that many boxes. Either network boot or a production line to pre-install disk images. Servers that self-configure on boot to determine their network config and load the latest rev of the software they'll be running. Normal datacenter ops practices don't scale to what Google has.


Google is a company that has built a single very large, custom computer. It's running their own cluster operating system. They make their big computer even bigger and faster each month, while lowering the cost of CPU cycles. It's looking more like a general purpose platform than a cluster optimized for a single application.


While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming.
This computer is running the world's top search engine, a social networking service, a shopping price comparison engine, a new email service, and a local search/yellow pages engine. What will they do next with the world's biggest computer and most advanced operating system?