Skip to main content

Models

Hadoop

Doug Cutting decided to write a free, open-source implementation of Google's MapReduce
A bit of dinosaur these days

Two parts to Hadoop:

Hadoop Distributed File System (HDFS)
1. Allows you to store data on a cluster of computers without worrying about what data is on which node
2. Instead, you refer to locations in HDGS just as you would for files in a normal directory system
The actual MR framework
1. Reads in data from HDFS, processes it in parallel, and writes its output to HDG

Spark

MapReduce

Hadoop
Spark
MapReduce