Posted by tomwhite
on August 24, 2006 at 2:01 PM PDT
Amazon's new Elastic Compute Cloud should be a perfect fit for running Hadoop jobs.
In March I wrote of affordable web-scale computing :
I would love an API that exposes Google's MapReduce , a simple programming model for crunching on large datasets. You can write and run MapReduce programs today, using Hadoop , but it's only really useful if you have enough machines at your disposal. The pay-as-you-go model of S3 (and Sun Grid ) would be very attractive to developers who want to run ad hoc computations, or can't afford the upfront investment in hardware.
Well, now it's possible with the beta launch of Amazon EC2 (picked up from the O'Reilly Radar ). EC2 (apart from coincidentally being the postcode of my company's new London office) stands for Elastic Compute Cloud and allows you to commission compute resources on an on-demand basis using simple web-service-based tools. The unit of compute capacity is an Amazon Machine Image (AMI) - a Linux image - which you can configure to have any software you like on it. You can run any number of instances, paying for the number of instance hours you use (and the data you transfer).
This goes beyond what I wished for in March as it allows you to run anything on the image! Going back to Hadoop and MapReduce, I can imagine a generic Hadoop AMI that you configure your job on, before commissioning a number of EC2 server instances to run it. Press go, wait for your job to complete and then decommission the server instances.
Definitely one to watch.