Hadoop 1.x Architecture and Drawbacks
February 13, 2017 byNiranjan Tallapalli Leave a comment
Hadoop is built on two whitepapers published by Google, i.e,HDFS Map Reduce
HDFS: Hadoop Distributed File System
It is different from the normal file system in a way that the data copied on to HDFS is split into ‘n’ blocks and each block is copied on to a different node in the cluster. To achieve this we use master-slave architectureHDFS Master => Name Node: Takes the client request and responsible for orchestrating the data copy across the cluster HDFS Slave => Data Node: Actually saves the block of data and coordinates with its master
MapReduce: This is the processing engine and is also implemented in master-slave architecture.MR Master => Job Tracker: Takes the incoming jobs, identifies the available resources across the cluster, divides the job into tasks and submits it to the cluster MR Slave => Task Tracker: Actually runs the task and coordinates with its master. Architecture
Drawbacks Design of JobTracker is done in such a way that its tightly coupled with two important responsibilities “Resource Management” and “MapReduce Task Execution”. Because of this reason the cluster cannot be used for other distributed computing technologies like Spark/Kafka/Storm/… other than Hadoop MapReduce Name Node can maintain metadata of upto 4000-5000 data nodes at maximum. This will limit the cluster scalability to 4k-5k nodes
Addressing these drawbacks hadoop 2.x is released.