Models
Hadoop
- Doug Cutting decided to write a free, open-source implementation of Google's MapReduce
- A bit of dinosaur these days
Two parts to Hadoop:
- Hadoop Distributed File System (HDFS)
- Allows you to store data on a cluster of computers without worrying about what data is on which node
- Instead, you refer to locations in HDGS just as you would for files in a normal directory system
- The actual MR framework
- Reads in data from HDFS, processes it in parallel, and writes its output to HDG