您的位置:首页 > 运维架构 > Apache

YARN Framework(MapReduce 2.0 in Apache Hadoop 0.23)

2012-12-11 17:17 429 查看
UHP博客文章地址:http://yuntai.1kapp.com/?p=600


原创文章,转载请注明出处:http://blog.csdn.net/wind5shy/article/details/8283371

原文链接:

http://blog.cloudera.com/blog/2012/02/mapreduce-2-0-in-hadoop-0-23/

In Building and Deploying MR2 wepresented
a brief introduction to MapReduce in Hadoop 0.23 and focused on thesteps to set up a single-node cluster. This blog provides developers witharchitectural details of the new MapReduce design.

Building and Deploying MR2 一文中我们对Hadoop
0.23版本中的MapReduce进行了简要的介绍,并介绍了如何创建单节点的集群。这篇博客将为开发者介绍新的MapReduce设计的架构细节。

Apache Hadoop 0.23 has major improvements over previousreleases. Here are a few highlights on the MapReduce front; note that there arealso major HDFS improvements, which are out of scope of this post.

Apache Hadoop 0.23相对以前的版本有大幅度的改动。这篇文章介绍了MapReduce相关的一些重要改动,需要注意的是在HDFS方面同样有大幅度的改动,但这些改动不在本文的范围内。

MapReduce 2.0 (a.k.a. MRv2 or YARN):

The new architecture divides the two major functions of theJobTracker– resource management and job life-cyclemanagement– into separate components:

新的结构将JobTracker的两个主要功能:资源管理和job生命周期管理分隔成不同的组件。

A ResourceManager (RM) that manages the global assignment of compute resources to applications.

资源管理器管理分配给应用的全局计算资源。

A per-application ApplicationMaster (AM) that manages the application’s life cycle.

应用控制器管理应用的生命周期,每个应用一个AM。
In Hadoop 0.23, a MapReduce application is a single job in thesense of classic MapReduce, executed by the MapReduce ApplicationMaster.

在Hadoop 0.23,MR应用就是旧版本MR中的一个job,被MR AM执行。

There is also a per-machine NodeManager(NM) thatmanages the user processes on that machine. The RM and the NM form thecomputation fabric of
the cluster. The design also allows plugging long-runningauxiliary services to the NM; these are application-specific services,specified as part of the configuration, and loaded by the NM during startup.For MapReduce applications on YARN, shuffle is a typical
auxiliary serviceloaded by the NMs. Note that, in Hadoop versions prior to 0.23, shuffle waspart of the TaskTracker.

节点管理器管理机器上的用户进程,每个机器一个NM。RM和NM构成了集群的计算结构。设计也允许插入长时间运行的附属服务到NM;这些服务是应用相关服务,它们作为配置的一部分被定义,在NM启动的时候载入。YARN上的MR应用,shuffle是一个典型的被NM载入的附属服务。在0.23以前的hadoop,shuffle是TaskTracker的一部分。

The per-application ApplicationMaster is a framework specificlibrary and is tasked with negotiating resources from the ResourceManager andworking with the NodeManager(s) to execute and monitor the tasks. In the YARNdesign, MapReduce
is just one application framework; the design permitsbuilding and deploying distributed applications using other frameworks. Forexample, Hadoop 0.23 ships with a Distributed Shell application that permitsrunning a shell script on multiple nodes on the YARN
cluster. At the time ofwriting this blog post, there is also an ongoing development effort to allowrunning Message Passing Interface (MPI) applications on top of YARN.

AM是一个用于同RM协调资源和同NM一起来执行和监控task的框架库。在YARN的设计中,MR只是一种应用框架;设计允许创建和发布使用其他框架的分布式应用。Hadoop 0.23自带了一个分布式shell应用,其可以在YARN集群的多个节点上运行一个shell脚本。在这篇博客发布的时候,正在开发将消息传递接口应用运行在YARN上的功能。

MapReduce 2.0 Design:

Figure 1 shows a pictorial representation of a YARN cluster.There is a single Resource Manager, which has two main services:

图1中有一个RM,包含两个主要的服务:

A pluggable Scheduler, which manages and enforces the resource scheduling policy in the cluster. Note that, at the time of writing this blog post, there are two schedulers supported
in Hadoop 0.23, the default FIFO scheduler and the Capacity scheduler; the Fair Scheduler is not yet supported.

一个可插拔的调度器,其管理和执行集群的资源调度政策。在这篇博客发布的时候,Hadoop 0.23支持两个协调器:默认的先进先出调度器和容量调度器,公平调度器还不支持(2.0已支持)。

An Applications Manager (AsM), which manages running Application Masters in the cluster, i.e., it is responsible for starting application masters and for monitoring and restarting them
on different nodes in case of failures.

一个应用管理器,其管理集群上运行的AM,负责启动、监控AM,并在AM失败时在不同的节点上重启AM(2.0代码中没有同名类,而是将其功能分散到其他类中了)。



Fig. 1
Figure 1 also shows that there is a NM service running on eachnode in the cluster. The diagram also shows two AMs (AM1 andAM2).
In a YARN cluster at anygiven time, there will be as many running Application Masters as there areapplications (jobs). Each AM manages the application’s individual tasks (starting, monitoring and restarting in case offailures).
The diagram shows AM1 managingthree tasks (containers 1.1, 1.2 and 1.3), while AM2 managesfour
tasks (containers 2.1, 2.2, 2.3 and 2.4). Each task runs within aContainer on each node. The AM acquires such containers from the RM’s Scheduler before contacting the corresponding NMs to start theapplication’s individual
tasks. These Containers can beroughly compared to Map/Reduce slots in previous Hadoop versions. However theresource allocation model in Hadoop-0.23 is more optimized from a clusterutilization perspective.

图1展示了集群中的每个节点上都运行有一个NM服务。图中有两个AM。在YARN集群中,任何时候运行的AM数量都和应用的数量相同。每个AM管理相应应用的task(启动、监控、失败后重启)。图中展示了AM1管理了3个task(containers1.1, 1.2 and 1.3),AM2管理了4个task (containers 2.1, 2.2, 2.3 and 2.4)。每个task在一个节点的container中运行。AM向RM的调度器请求container,之后同相关NM联系,启动应用相关task。Container类似之前hadoop版本中的Map/Reduce
slots。但Hadoop-0.23中的资源分配模型从集群利用率方面来看更有效率。

Resource Allocation Model:

In earlier Hadoop versions, each node in the cluster wasstatically assigned the capability of running a predefined number of Map slotsand a predefined number of Reduce slots. The slots could not be shared betweenMaps and Reduces.
This static allocation of slots wasn’t optimal since slot requirements vary during the MR job life cycle(typically, there is a demand for Map slots when the job starts, as opposed tothe need for Reduce slots towards the end).
Practically, in a real cluster,where jobs are randomly submitted and each has its own Map/Reduce slotsrequirement, having an optimal utilization of the cluster was hard, if not impossible.

在早期的hadoop版本中,集群中的每个节点通过预定义的Map/Reduce slots数量静态地分配可以运行的task数量。slot不能在map和reduce task之间共享。这种静态的分配不够优化,因为slot的需求在MR job生命周期中是变化的(典型情况,在job开始时基本都是对map slots的需求,而在job接近结束时则对reduce slots需求较高)。实际上,在一个真实的集群中,job是随机提交的,每个job有自己的Map/Reduce
slots需求,很难,甚至不可能对集群利用进行优化。

The resource allocation model in Hadoop 0.23 addresses suchdeficiency by providing a more flexible resource modeling. Resources arerequested in the form of containers, where each container has a number ofnon-static attributes.
At the time of writing this blog, the only supportedattribute was memory (RAM). However, the model is generic and there isintention to add more attributes in future releases (e.g. CPU and networkbandwidth). In this new Resource Management model, only a minimum
and a maximumfor each attribute are defined, and AMs can request containers with attributevalues as multiples of these minimums.

在Hadoop 0.23中的资源分配模型针对这个缺陷提供了一种更灵活的资源模型。资源请求以container的形式体现,每个container有一些非静态的属性。在写这篇博客的时候,唯一支持的属性是内存(目前仍然是这样)。但是,这个模型是通用的,而且在未来的发布版本中会添加更多的属性(如CPU、网络带宽等)。在这个新的资源管理模型中,只定义了每个属性的最小值和最大值,AM可以使用最小值的倍数作为属性值来请求container。

MapReduce 2.0 Main Components:

In this section, we’ll go throughthe main components of the new MapReduce architecture in detail to understandthe functionality of these components and how they interact with each other.

在这一段,我们将深入介绍新MapReduce架构的主要组件,使读者理解这些组件的功能和组件之间的交互过程。

Client – Resource Manager

Figure 2 illustrates the initial step for running an applicationon a YARN cluster. Typically a client communicates with the RM (specificallythe Applications Manager component of the RM) to initiates this process. Thefirst step,
marked (1) in the diagram, is for the client to notify theApplications Manager of the desire of submitting an application, this is done viaa“New Application Request”. The RM respose, marked (2), will typically contain
a newlygenerated unique application ID, in addition to information about clusterresource capabilities that the client will need in requesting resources forrunning the application’s AM.

图2展示了在YARN集群上运行一个应用的初始化步骤。典型情况,一个客户端联到RM(具体来说,联到RM的ASM组件)以初始化这个过程。第一步,图中标识为1,客户端通知ASM将提交一个应用,这个过程通过“新应用请求”来完成。RM的回应,标识为2,典型地,将包含一个新分配的唯一应用id,以及客户端为运行应用的AM进行的资源请求所需的集群资源容量等附加信息。

Using the information received from the RM, the client can constructand submit an“Application Submission Context”, marked (3), which typically contains information like schedulerqueue,
priority and user information, in addition to information needed by theRM to be able to launch the AM. This information is contained in a “Container Launch Context”, which containsthe application’s jar, job files, security tokens andany resource requirements.

使用从RM接收到的信息,客户端可以创建和提交一个“应用提交上下文”,标识为3,典型地,包含调度器队列、优先级和用户信息等信息,以及其他RM载入应用相应AM时所需的信息。这些信息包含在一个“容器载入上下文”对象中,其包含应用的jar,job文件,安全标记和资源需求。



Fig. 2
Following application submission, the client can query the RMfor application reports, receive such reports and, if needed, the client canalso ask the RM to kill the application. These three additional steps arepictorially depicted
in fig. 3.

在应用提交后,客户端可以向RM查询应用的报告,接受这些报告,如果需要的话,还可以要求RM杀掉应用。这3个附加的步骤如图3所示。


Fig. 3

Resource Manager
– Application Master


When the RM receives the application submission context from theclient, it finds an available container meeting the resource requirements forrunning the AM, and it contacts the NM for the container to start the AMprocess on this
node. Figure 4 depicts the following communication stepsbetween the AM and the RM (specifically the Scheduler component of the RM). Thefirst step, marked (1) in the diagram, is for the AM to register itself withthe RM. This step consists of a handshaking procedure
and also conveysinformation like the RPC port that the AM will be listening on, the trackingURL for monitoring the application’s status andprogress, etc.

当RM收到从客户端提交上来的应用提交上下文之后,它将找到一个可用的、能满足运行相应AM的资源需求的container,然后再同NM联系以获得相应container来启动AM进程。图4展示了AM和RM(具体来说,是RM的调度器)之间的交互步骤。第一步,图中标识为1,AM将自身注册到RM上。这个步骤包含一个握手过程,传输如AM的RPC监听端口,用于监控应用的状态和过程的追踪URL等信息。

The RM registration response, marked (2), will convey essentialinformation for the AM master like minimum and maximum resource capabilitiesfor this cluster. The AM will use such information in calculating andrequesting any resource
requests for the application’s individual tasks. The resource allocation request from the AM tothe RM, marked (3), mainly contains a list of requested containers, and mayalso contain a list of released containers by this
AM. Heartbeat and progressinformation are also relayed through resource allocation requests as shown byarrow (4).

RM的注册回应,标识为2,将传输AM所需的一些基本信息,如集群最小和最大的资源容量。AM将使用这些信息计算出应用相关的task的资源请求。从AM到RM的资源分配请求,标识为3,主要包含一个已请求container列表和一个AM已释放的container列表。心跳和进程信息也通过资源分配请求进行转发,如箭头4所示。

When the Scheduler component of the RM receives a resourceallocation request, it computes, based on the scheduling policy, a list ofcontainers that satisfy the request and sends back an allocation response,marked (5), which contains
a list of allocated resources. Using the resourcelist, the AM starts contacting the associated node managers (as will be soonseen), and finally, as depicted by arrow (6), when the job finishes, the AMsends a Finish Application message to the Resource Manager
and exits.

当RM的调度器组件接受到一个资源分配请求,它将基于调度策略进行计算,得到一个满足请求的container列表,然后发送一个分配回应,标识为5,其中包含一个已分配资源列表。通过这个列表,AM开始联系这些相关NM,最终,当job完成时,AM发送一个应用完成消息到RM并退出,如箭头6所示。


Fig. 4

Application Master
– Container Manager


Figure 5 describes the communication between the AM and the NodeManagers. The AM requests the hosting NM for each container to start it asdepicted by arrow (1) in the diagram. While containers are running, the AM canrequest and
receive a container status report as shown in steps (2) and (3),respectively.

图5描述了AM和NM之间的交互。AM请求每个container所在的NM启动相应container,如图中箭头1所示。在container运行时,AM可以请求和接收container的状态报告,如箭头2和3所示。



Fig. 5
Based on the above discussion, a developer writing YARNapplications will be mainly concerned with the following interfaces:

基于以上的讨论,编写YARN应用的开发者应该主要关心以下接口:

ClientRMProtocol: Client RM (Fig.
3).

This is the protocol for a client to communicate with the RM to launch a new application (i.e. an AM), check on the status of the application or kill the application.

客户端同RM交互的协议,包括提交新应用,检查应用状态和杀死应用。

AMRMProtocol: AM RM (Fig. 4).

This is the protocol used by the AM to register/unregister itself with the RM, as well as to request resources from the RM Scheduler to run its tasks.

AM同RM交互的协议,包括AM注册/注销自身到RM,AM从RM调度器请求资源来运行task。

ContainerManager: AM NM (Fig.
5).

This is the protocol used by the AM to communicate with the NM to start or stop containers and to get status updates on its containers.

AM和NM交互协议,包含开始/停止container和得到AM拥有的container的状态更新。

Migrating older MapReduce applications to run on Hadoop 0.23:

All client-facing MapReduce interfaces are unchanged, whichmeans that there is no need to make any source code changes to run on top ofHadoop 0.23.

所有针对MapReduce的客户端接口没有改变,即相关用户认为无需更改任何代码就可以运行在Hadoop0.23(但需要重新编译)。

Useful links:

How to write a YARN Application.
Hadoop 0.23.0 Javadocs.

Tweet


原创文章,转载请注明出处:http://blog.csdn.net/wind5shy/article/details/8283371

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐