Today I re-read the MapReduce paper and realize how relevant it is to the daily job we do in Amazon, especially for the content analysis work slated for the future of POD.
I decided to take a plunge into it. A good starting point is here. To summaries MapReduce in short (which is alway dangerous, because there are a lot of engineering tricks built around it that will be inevitably omitted in the summary):
1) Map distributes all the task and storage to a group of hosts, so that all the operation is local and has no concurrency issues.
2) Reduce will collect the local results in a synchronized way.
One thing that really stirred my interest into distributed computing, is Herb's paper .
Indeed, the landscape of software has moved away from the single chip free lunch. The sooner one realizes it, the better he/she will be prepared for this inevitable future.
Tuesday, September 4, 2007
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment