Hadoop YARN components

YARN

YARN stands for Yet Another Resource Negotiator (YARN). It is responsible for hadoop cluster resource management and scheduling. Various applications can run on YARN e.g MapReduce, Spark. YARN is also referred to as MapReduce2(MR2) or NextGen MapReduce. But the name MR2 is a bit deceiving as YARN doesn’t have to be tied to MapReduce. It can also run Spark applications.

YARN vs. Old MapReduce

The initial versions of Hadoop i.e. Hadoop 1.x is tightly coupled with MapReduce and thus was also known as MapReduce or MR1. Hadoop 1.x has JobTracker and TaskTracker daemons. JobTracker is responsible for handling resources and tasks’ progress monitoring/management. It deals with failed tasks task bookkeeping.

JobTracker based approach has drawbacks such as Scalability Bottleneck where only 4,000+ nodes could be added to cluster. It provides limited Cluster Resource sharing and allocation flexibility. It follows Slot based approach (ex. 10 slots per machine no matter how small or big those tasks are). Due to these reasons Hadoop 2.x was redesigned to introduce YARN.

Hadoop 2.x not using JobTracker and TaskTracker daemons for resource management now on-wards, it is using YARN (Yet Another Resource Negotiator) for Resource Management. YARN provides its core services via two types of long-running daemon:

  • A ResourceManager (one per cluster) to manage the use of resources across the cluster.
  • NodeManagers running on all the nodes in the cluster to launch and monitor containers.

YARN can run on larger clusters than MapReduce 1. It is designed to scale up to 10,000 nodes. In YARN, a node manager manages a pool of resources, rather than a fixed number of designated slots. MapReduce running on YARN will not hit the situation where a reduce task has to wait because only map slots are available on the cluster, which can happen in MapReduce 1.

Leave a Reply

Your email address will not be published. Required fields are marked *