MapReduce Paper (2004)

Back to Key Papers to Read

“MapReduce: Simplified Data Processing on Large Clusters” (2004) — Dean, Ghemawat

Introduced the MapReduce programming model for processing large datasets in parallel across commodity clusters. Map phase processes input key/value pairs; reduce phase merges intermediate values. Abstracted away parallelization, fault tolerance, and distribution. Created the foundation for Hadoop MapReduce.


google-internal papers mapreduce #2004