Skip to main content

Models

Hadoop

  • Doug Cutting decided to write a free, open-source implementation of Google's MapReduce
  • A bit of dinosaur these days

Two parts to Hadoop:

  1. Hadoop Distributed File System (HDFS)
    1. Allows you to store data on a cluster of computers without worrying about what data is on which node
    2. Instead, you refer to locations in HDGS just as you would for files in a normal directory system
  2. The actual MR framework
    1. Reads in data from HDFS, processes it in parallel, and writes its output to HDG

Spark

MapReduce