Phase 3: Emergence of YARN
The JobTracker would ideally require a complete rewrite to fix the majority of the scaling issues. Even if it were successful, however, this rewrite would not necessarily resolve the coupling between platform and user code, nor would it address users’ appetite for non-MapReduce programming models or the dependency between careful admission control and JobTracker scalability. Absent a significant redesign, cluster availability would continue to be tied to the stability of the whole system.
Building on lessons learned by evolving Apache Hadoop MapReduce, YARN was designed to address the specific requirements stated so far (i.e., Requirement 1 through Requirement 9). However, the massive installed base of MapReduce applications, the ecosystem of related projects, the well-worn deployment practice, and a tight schedule could not tolerate a radical new user interface. Consequently, the new architecture and the corresponding implementation reused as much code from the existing framework as possible, behaved in familiar patterns, and exposed the same interfaces for the existing MapReduce users. This led to the final requirement for the YARN redesign: [Requirement 10] Backward Compatibility.
[Requirement 10] Backward Compatibility
The next-generation compute platform should maintain complete backward compatibility of existing MapReduce applications.
To summarize the requirements for YARN, we need the following features:
- [Requirement 1] Scalability: The next-generation compute platform should scale horizontally to tens of thousands of nodes and concurrent applications.
- [Requirement 2] Serviceability: The next-generation compute platform should enable evolution of cluster software to be completely decoupled from users’ applications.
- [Requirement 3] Multitenancy: The next-generation compute platform should support multiple tenants to coexist on the same cluster and enable fine-grained sharing of individual nodes among different tenants.
- [Requirement 4] Locality Awareness: The next-generation compute platform should support locality awareness—moving computation to the data is a major win for many applications.
- [Requirement 5] High Cluster Utilization: The next-generation compute platform should enable high utilization of the underlying physical resources.
- [Requirement 6] Secure and Auditable Operation: The next-generation compute platform should continue to enable secure and auditable usage of cluster resources.
- [Requirement 7] Reliability and Availability: The next-generation compute platform should have a very reliable user interaction and support high availability.
- [Requirement 8] Support for Programming Model Diversity: The next-generation compute platform should enable diverse programming models and evolve beyond just being MapReduce-centric.
- [Requirement 9] Flexible Resource Model: The next-generation compute platform should enable dynamic resource configurations on individual nodes and a flexible resource model.
- [Requirement 10] Backward Compatibility: The next-generation compute platform should maintain completely backward compatibility of existing MapReduce applications.