Yarn REST API 使用指南
Yarn作为一款优秀的开源集群管理工具,可以用它来运行Hadoop,Spark,Flink等大数据处理任务。所有的分布式计算框架,都是主从模式,ResourceManager作为集群的管理员,是任务提交的入口。一般企业的大数据处理平台会在Yarn的基础中做进一步的封装,以web应用的形式提供更高级的大数据处理平台。如果web应用通过shell的方式提交任务将显得太重,而且不利于任务状态的监控。幸好Yarn提供了ResourceManager的REST API,可以很方便的向集群提交、杀死任务或者监控任务的状态,本文将简要介绍ResourceManager REST API的使用方法。
向集群提交任务
先通过POST请求生成application_id再通过这个id来提交任务。
生成application-id
通过POST请求rm-http-address:port/ws/v1/cluster/apps/new-application,不需要参数,ResourceManager将返回application_id与集群可用资源配额,如下:
请求url
http://cdh-1:8088/ws/v1/cluster/apps/new-application请求参数:无
返回数据
{
    "application-id": "application_1613349389113_0535",
    "maximum-resource-capability": {
        "memory": 16384,
        "vCores": 8,
        "resourceInformations": {
            "resourceInformation": [
                {
                    "maximumAllocation": 9223372036854775807,
                    "minimumAllocation": 0,
                    "name": "memory-mb",
                    "resourceType": "COUNTABLE",
                    "units": "Mi",
                    "value": 16384
                },
                {
                    "maximumAllocation": 9223372036854775807,
                    "minimumAllocation": 0,
                    "name": "vcores",
                    "resourceType": "COUNTABLE",
                    "units": "",
                    "value": 8
                }
            ]
        }
    }
}返回json字段说明
| Item | Data Type | Description | 
|---|---|---|
| application-id | string | The newly created application id | 
| maximum-resource-capability | object | The maximum resource capabilities available on this cluster | 
maximum-resource-capability元素说明
| Item | Data Type | Description | 
|---|---|---|
| memory | int | 容器可用的最大内存 | 
| vCores | int | 容器可用的最大核心数 | 
提交任务
提交应用程序API可用于提交应用程序。如果提交申请,则必须首先使用Cluster New Application API获得一个应用程序ID 。应用程序ID必须是请求正文的一部分。响应包含指向应用程序页面的URL,可用于跟踪应用程序的状态和进度。
通过POST请求rm-http-address:port/ws/v1/cluster/apps提交任务
请求url
http://cdh-1:8088/ws/v1/cluster/apps请求参数
  POST http://<rm http address:port>/ws/v1/cluster/apps
  Accept: application/json
  Content-Type: application/json
  {
    "application-id":"application_1404203615263_0001",
    "application-name":"test",
    "am-container-spec":
    {
      "local-resources":
      {
        "entry":
        [
          {
            "key":"AppMaster.jar",
            "value":
            {
              "resource":"hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar",
              "type":"FILE",
              "visibility":"APPLICATION",
              "size": 43004,
              "timestamp": 1405452071209
            }
          }
        ]
      },
      "commands":
      {
        "command":"{{JAVA_HOME}}/bin/java -Xmx10m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr"
      },
      "environment":
      {
        "entry":
        [
          {
            "key": "DISTRIBUTEDSHELLSCRIPTTIMESTAMP",
            "value": "1405459400754"
          },
          {
            "key": "CLASSPATH",
            "value": "{{CLASSPATH}}<CPS>./*<CPS>{{HADOOP_CONF_DIR}}<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/*<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/lib/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/lib/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/lib/*<CPS>./log4j.properties"
          },
          {
            "key": "DISTRIBUTEDSHELLSCRIPTLEN",
            "value": "6"
          },
          {
            "key": "DISTRIBUTEDSHELLSCRIPTLOCATION",
            "value": "hdfs://hdfs-namenode:9000/user/testuser/demo-app/shellCommands"
          }
        ]
      }
    },
    "unmanaged-AM":false,
    "max-app-attempts":2,
    "resource":
    {
      "memory":1024,
      "vCores":1
    },
    "application-type":"YARN",
    "keep-containers-across-application-attempts":false
  }请求参数说明
| Item | Data Type | Description | 
|---|---|---|
| application-id | string | 申请编号 | 
| application-name | string | 申请名称 | 
| queue | string | 应将应用程序提交到的队列的名称 | 
| priority | int | 应用程序的优先级 | 
| am-container-spec | object | 应用程序主容器启动上下文,如下所述 | 
| unmanaged-AM | boolean | 该应用程序是否使用非托管应用程序主机 | 
| max-app-attempts | int | 此应用程序的最大尝试次数 | 
| resource | object | 应用程序主机需要的资源,如下所述 | 
| application-type | string | 应用程序类型(MapReduce,Pig,Hive等) | 
| keep-containers-across-application-attmpts | boolean | YARN是否应保留此应用程序使用的容器而不是销毁它们 | 
| application-tags | object | 应用程序标签列表,请参阅有关如何指定标签的请求示例 | 
am-container-spec对象的元素
应该使用am-container-spec对象为应用程序主机提供容器启动上下文。
| Item | Data Type | Description | 
|---|---|---|
| local-resources | object | 描述需要本地化的资源的对象,如下所述 | 
| environment | object | 容器的环境变量,指定为键值对 | 
| commands | object | 用于启动容器的命令(应按执行顺序) | 
| service-data | object | 特定于应用程序的服务数据;key是辅助服务的名称,值是您希望传递的数据的base-64编码 | 
| credentials | object | 您的应用程序运行所需的凭据,如下所述 | 
| application-acls | object | 您的应用程序的ACLs;密钥可以是“ VIEW_APP”或“ MODIFY_APP”,值是具有权限的用户列表 | 
local-resource对象的元素
| Item | Data Type | Description | 
|---|---|---|
| resource | string | 要本地化的资源的位置 | 
| type | string | 资源类型;选项是“ ARCHIVE”,“ FILE”和“ PATTERN” | 
| visibility | string | 可见要本地化的资源;选项是“ PUBLIC”,“ PRIVATE”和“ APPLICATION” | 
| size | long | 要本地化的资源大小 | 
| timestamp | long | 要本地化的资源的时间戳 | 
credentials对象的元素
| Item | Data Type | Description | 
|---|---|---|
| tokens | object | 您希望传递给应用程序的令牌,指定为键值对。密钥是令牌的标识符,值是令牌(应使用相应的Web服务获取) | 
| secrets | object | 您希望在应用程序中使用的机密,指定为键值对。它们的键是标识符,值是密钥的base-64编码 | 
resource对象的元素
| Item | Data Type | Description | 
|---|---|---|
| memory | int | 每个容器所需的内存 | 
| vCores | int | 每个容器所需的虚拟核心 | 
返回数据: 无
提交后ResourceManager不会返回信息。从请求的参数可以看出,如果想通过Yarn来管理某个分布式计算任务,必须有对应的ApplicationMaser实现,如上例中的AppMaster.jar,其执行入口类为org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster。常见的分布式计算任务如Hadoop、Spark与Flink都有,但Flink-yarn项目对Flink 1.9貌似还 没有实现对应的ApplicationMaster,所以Flink 1.9的任务目前还不能通过Rest API提交(1.7可以)。
查询任务状态信息
查询所有任务
通过GET请求rm-http-address:port/ws/v1/cluster/apps将获取到所有任务的信息列表,如
请求url
 http://cdh-:8088/ws/v1/cluster/apps请求参数(可选)
可以为GET操作指定多个参数。开始时间和结束时间都有一个begin和end参数,以允许您指定范围。例如,可以请求在2021年3月2日早上09:00:00和早上10:00:00之间启动的所有应用程序,其中startedTimeBegin = 1614646800000&startedTimeEnd = 1614650400000。如果未指定Begin参数,则默认为0;如果未指定End参数,则默认为无穷大。
  * state [deprecated] - state of the application
  * states - applications matching the given application states, specified as a comma-separated list.[NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED]
  * finalStatus - the final status of the application - reported by the application itself [UNDEFINED, FAILED, KILLED, SUCCEEDED]
  * user - user name
  * queue - queue name
  * limit - total number of app objects to be returned
  * startedTimeBegin - applications with start time beginning with this time, specified in ms since epoch
  * startedTimeEnd - applications with start time ending with this time, specified in ms since epoch
  * finishedTimeBegin - applications with finish time beginning with this time, specified in ms since epoch
  * finishedTimeEnd - applications with finish time ending with this time, specified in ms since epoch
  * applicationTypes - applications matching the given application types, specified as a comma-separated list.
  * applicationTags - applications matching any of the given application tags, specified as a comma-separated list.返回数据
{
    "apps": {
        "app": [
            {
                "id": "application_1613349389113_0393",
                "user": "root",
                "name": "com.zzmj.main.KafkaToParquet",
                "queue": "root.users.root",
                "state": "RUNNING",
                "finalStatus": "UNDEFINED",
                "progress": 10.0,
                "trackingUI": "ApplicationMaster",
                "trackingUrl": "http://cdh-1:8088/proxy/application_1613349389113_0393/",
                "diagnostics": "",
                "clusterId": 1613349389113,
                "applicationType": "SPARK",
                "applicationTags": "",
                "priority": 0,
                "startedTime": 1614648944707,
                "launchTime": 1614648945179,
                "finishedTime": 0,
                "elapsedTime": 515573848,
                "amContainerLogs": "http://cdh-4:8042/node/containerlogs/container_1613349389113_0393_01_000001/root",
                "amHostHttpAddress": "cdh-4:8042",
                "amRPCAddress": "cdh-4:46434",
                "allocatedMB": 5120,
                "allocatedVCores": 4,
                "reservedMB": 0,
                "reservedVCores": 0,
                "runningContainers": 2,
                "memorySeconds": 2514229328,
                "vcoreSeconds": 1914825,
                "queueUsagePercentage": 6.25,
                "clusterUsagePercentage": 6.25,
                "resourceSecondsMap": {
                    "entry": {
                        "key": "memory-mb",
                        "value": "2514229328"
                    },
                    "entry": {
                        "key": "vcores",
                        "value": "1914825"
                    }
                },
                "preemptedResourceMB": 0,
                "preemptedResourceVCores": 0,
                "numNonAMContainerPreempted": 0,
                "numAMContainerPreempted": 0,
                "preemptedMemorySeconds": 0,
                "preemptedVcoreSeconds": 0,
                "preemptedResourceSecondsMap": {},
                "logAggregationStatus": "NOT_START",
                "unmanagedApplication": false,
                "amNodeLabelExpression": "",
                "timeouts": {
                    "timeout": [
                        {
                            "type": "LIFETIME",
                            "expiryTime": "UNLIMITED",
                            "remainingTimeInSeconds": -1
                        }
                    ]
                }
            }
        ]
    }
}字段说明
| Item | DataType | Description | 
|---|---|---|
| id | string | 应用的application-id | 
| user | string | 提交任务的用户名 | 
| name | string | 应用程序的名称 | 
| queue | string | 应用程序所属消息队列 | 
| state | string | 应用程序当前状态 | 
| finalStatus | string | 应用程序最终状态 | 
| progress | double | 应用程序进度 | 
| trackingUI | string | 追踪UI显示名称 | 
| trackingUrl | string | 追踪UI的url | 
| clusterId | string | 集群id | 
| applicationType | string | 应用程序类型 | 
| priority | int | 应用程序优先级 | 
| startedTime | long | 应用程序开始时间 | 
| launchTime | long | 应用程序加载时间 | 
| finishedTime | long | 应用程序完成时间 | 
| elapsedTime | long | 应用程序消耗时间(finished-start) | 
| amContainerLogs | string | am容器日志地址 | 
| amHostHttpAddress | string | am的主机http地址 | 
| amRPCAddress | string | am的RPC地址 | 
| allocatedMB | string | 初始化内存大小 | 
| allocatedVCores | string | 初始化核心数 | 
| reservedMB | string | 保留内存 | 
| reservedVCores | string | 保留核心数 | 
| runningContainers | string | 正在运行的容器数 | 
| memorySeconds | int | 所有的container每秒消耗的内存总和 | 
| vcoreSecond | string | 所有的container每秒消耗的核心数总和 | 
| queueUsagePercentage | double | 所属队列的资源使用百分比 | 
| clusterUsagePercentage | double | 所属集群的资源使用百分比 | 
| logAggregationStatus | string | 日志聚合状态 | 
| unmanagedApplication | boolean | 未被管理的应用程序 | 
查询单个任务
通过GET请求rm-http-address:port/ws/v1/cluster/apps/{appid},如
http://cdh-1:8088/ws/v1/cluster/apps/application_1613349389113_0001{
    "app": {
        "id": "application_1613349389113_0001",
        "user": "root",
        "name": "StorageAgg",
        "queue": "root.users.root",
        "state": "RUNNING",
        "finalStatus": "UNDEFINED",
        "progress": 100.0,
        "trackingUI": "ApplicationMaster",
        "trackingUrl": "http://cdh-1:8088/proxy/application_1613349389113_0001/",
        "diagnostics": "",
        "clusterId": 1613349389113,
        "applicationType": "Apache Flink",
        "applicationTags": "",
        "priority": 0,
        "startedTime": 1613358478865,
        "launchTime": 1613358479810,
        "finishedTime": 0,
        "elapsedTime": 1829294258,
        "amContainerLogs": "http://cdh-4:8042/node/containerlogs/container_1613349389113_0001_01_000001/root",
        "amHostHttpAddress": "cdh-4:8042",
        "amRPCAddress": "cdh-4:8082",
        "allocatedMB": 9216,
        "allocatedVCores": 4,
        "reservedMB": 0,
        "reservedVCores": 0,
        "runningContainers": 2,
        "memorySeconds": 16858454105,
        "vcoreSeconds": 7317057,
        "queueUsagePercentage": 11.25,
        "clusterUsagePercentage": 11.25,
        "resourceSecondsMap": {
            "entry": {
                "key": "memory-mb",
                "value": "16858454105"
            },
            "entry": {
                "key": "vcores",
                "value": "7317057"
            }
        },
        "preemptedResourceMB": 0,
        "preemptedResourceVCores": 0,
        "numNonAMContainerPreempted": 0,
        "numAMContainerPreempted": 0,
        "preemptedMemorySeconds": 0,
        "preemptedVcoreSeconds": 0,
        "preemptedResourceSecondsMap": {},
        "logAggregationStatus": "NOT_START",
        "unmanagedApplication": false,
        "amNodeLabelExpression": "",
        "timeouts": {
            "timeout": [
                {
                    "type": "LIFETIME",
                    "expiryTime": "UNLIMITED",
                    "remainingTimeInSeconds": -1
                }
            ]
        }
    }
}修改任务状态
通过Cluster Application State API来完成,当然需要得到RM web service的授权才能进行这种操作,URI为rm-http-address:port/ws/v1/cluster/apps/{appid}/state,如
通过GET请求该URI
http://cdh-1:8088/ws/v1/cluster/apps/application_1613349389113_0001/state将返回的application_id为application_1613349389113_0001的任务的状态
{
    "state": "RUNNING"
}然会参数列举说明
| Item | Data Type | Description | 
|---|---|---|
| state | string | The application state - can be one of “NEW”, “NEW_SAVING”, “SUBMITTED”, “ACCEPTED”, “RUNNING”, “FINISHED”, “FAILED”, “KILLED” | 
或者通过PUT请求来kill一个job
 http://cdh-1:8088/ws/v1/cluster/apps/application_1399397633663_0003/state{
  "state":"KILLED"
}返回结果为
{
  "state":"RUNNING"
}其实的state的状态为当前程序的状态【RUNNING, ACCEPTED】
重新通过GET请求该任务的状态
发现该任务已经被kill了
{
  "state":"KILLED"
}查看与调整任务优先级
主要通过访问rm-http-address:port/ws/v1/cluster/apps/{appid}/priority来完成。统一需要得到RM web services的授权
查看任务优先级
通过GET请求上面的URI,如
 http://cdh-1:8088/ws/v1/cluster/apps/application_1613349389113_0001/priority将返回application_id为application_1613349389113_0001的任务的优先级
{
    "priority": 0
}修改任务优先级
如果这项任务非常重要,你想让它优先执行,可通过PUT请求来修改其优先级
 http://cdh-1:8088/ws/v1/cluster/apps/application_1613349389113_0001/priority
 请求参数:
    Accept: application/json
    Content-Type: application/json
    {
        "priority": 8
    }集群API
集群信息
主要通过GET请求访问rm-http-address:port/ws/v1/cluster来完成。
通过GET请求上面的URI,如
 http://cdh-1:8088/ws/v1/cluster请求参数: 无
返回数据
{
    "clusterInfo": {
        "id": 1613349389113,
        "startedOn": 1613349389113,
        "state": "STARTED",
        "haState": "ACTIVE",
        "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore",
        "resourceManagerVersion": "3.0.0-cdh6.3.2",
        "resourceManagerBuildVersion": "3.0.0-cdh6.3.2 from 9aff20de3b5ecccf3c19d57f71b214fb4d37ee89 by jenkins source checksum bdea11f98ce3d056d6c170c883b73569",
        "resourceManagerVersionBuiltOn": "2019-11-08T13:55Z",
        "hadoopVersion": "3.0.0-cdh6.3.2",
        "hadoopBuildVersion": "3.0.0-cdh6.3.2 from 9aff20de3b5ecccf3c19d57f71b214fb4d37ee89 by jenkins source checksum f539c87da37534aad732f2a7ddcc59",
        "hadoopVersionBuiltOn": "2019-11-08T13:49Z",
        "haZooKeeperConnectionState": "Could not find leader elector. Verify both HA and automatic failover are enabled."
    }
}返回数据字段说明
| Item | Data Type | Description | 
|---|---|---|
| id | long | 集群ID | 
| startedOn | long | 集群启动的时间(从纪元开始以毫秒为单位) | 
| state | string | ResourceManager状态-有效值为:NOTINITED,INITED,STARTED,STOPPED | 
| haState | string | ResourceManager HA状态-有效值为:INITIALIZING,ACTIVE,STANDBY,STOPPED | 
| rmStateStoreName | string | 实现ResourceManager状态存储的类的完全限定名称 | 
| resourceManagerVersion | string | ResourceManager的版本 | 
| resourceManagerBuildVersion | string | ResourceManager构建字符串以及构建版本,用户和校验和 | 
| resourceManagerVersionBuiltOn | string | 生成ResourceManager的时间戳(自纪元以来以毫秒为单位) | 
| hadoopVersion | string | Hadoop通用版本 | 
| hadoopBuildVersion | string | 具有构建版本,用户和校验和的Hadoop通用构建字符串 | 
| hadoopVersionBuiltOn | string | 建立hadoop common的时间戳(自纪元以来以毫秒为单位) | 
| haZooKeeperConnectionState | string | ZooKeeper高可用性服务的连接状态 | 
集群指标
通过GET请求rm-http-address:port/ws/v1/cluster/metrics来完成。
通过GET请求上面URI,如
 http://cdh-1:8088/ws/v1/cluster/metrics请求参数: 无
返回数据
{
    "clusterMetrics": {
        "appsSubmitted": 547,
        "appsCompleted": 506,
        "appsPending": 0,
        "appsRunning": 8,
        "appsFailed": 14,
        "appsKilled": 19,
        "reservedMB": 0,
        "availableMB": 36864,
        "allocatedMB": 45056,
        "reservedVirtualCores": 0,
        "availableVirtualCores": 8,
        "allocatedVirtualCores": 32,
        "containersAllocated": 16,
        "containersReserved": 0,
        "containersPending": 0,
        "totalMB": 81920,
        "totalVirtualCores": 40,
        "totalNodes": 5,
        "lostNodes": 0,
        "unhealthyNodes": 0,
        "decommissioningNodes": 0,
        "decommissionedNodes": 0,
        "rebootedNodes": 0,
        "activeNodes": 5,
        "shutdownNodes": 0
    }
}返回数据字段说明
| Item | Data Type | Description | 
|---|---|---|
| appsSubmitted | int | 提交的应用程序数量 | 
| appsCompleted | int | 完成的应用程序数量 | 
| appsPending | int | 等待的应用程序数量 | 
| appsRunning | int | 正在运行的应用程序数量 | 
| appsFailed | int | 失败的应用程序数量 | 
| appsKilled | int | 被杀死的应用程序数量 | 
| reservedMB | long | 保留的内存量(MB) | 
| availableMB | long | 可用的内存量(MB) | 
| allocatedMB | long | 分配的内存量(MB) | 
| totalMB | long | 总内存量(MB) | 
| reservedVirtualCores | long | 保留的虚拟核心数 | 
| availableVirtualCores | long | 可用虚拟核心数 | 
| allocatedVirtualCores | long | 分配的虚拟核心数 | 
| totalVirtualCores | long | 虚拟核心总数 | 
| containersAllocated | int | 分配的容器数 | 
| containersReserved | int | 保留的容器数 | 
| containersPending | int | 待处理的容器数 | 
| totalNodes | int | 节点总数 | 
| activeNodes | int | 活动节点数 | 
| lostNodes | int | 丢失的节点数 | 
| unhealthyNodes | int | 不良节点数 | 
| decommissioningNodes | int | 停用的节点数 | 
| decommissionedNodes | int | 退役的节点数 | 
| rebootedNodes | int | 重新启动的节点数 | 
| shutdownNodes | int | 关闭的节点数 | 
集群调度
调度程序资源包含有关集群中配置的当前调度程序的信息。它目前支持Fifo,容量和公平调度程序。根据配置哪个调度程序,您将获得不同的信息,因此请务必查看类型信息。
请求URI
http://cdh-1:8088/ws/v1cluster/scheduler请求参数: 无
返回数据
{
    "scheduler": {
        "schedulerInfo": {
            "type": "fairScheduler",
            "rootQueue": {
                "maxApps": 2147483647,
                "minResources": {
                    "memory": 0,
                    "vCores": 0,
                    "resourceInformations": {
                        "resourceInformation": [
                            {
                                "maximumAllocation": 16384,
                                "minimumAllocation": 1024,
                                "name": "memory-mb",
                                "resourceType": "COUNTABLE",
                                "units": "Mi",
                                "value": 0
                            },
                            {
                                "maximumAllocation": 8,
                                "minimumAllocation": 1,
                                "name": "vcores",
                                "resourceType": "COUNTABLE",
                                "units": "",
                                "value": 0
                            }
                        ]
                    }
                },
                "maxResources": {
                    "memory": 81920,
                    "vCores": 40,
                    "resourceInformations": {
                        "resourceInformation": [
                            {
                                "maximumAllocation": 9223372036854775807,
                                "minimumAllocation": 0,
                                "name": "memory-mb",
                                "resourceType": "COUNTABLE",
                                "units": "Mi",
                                "value": 81920
                            },
                            {
                                "maximumAllocation": 9223372036854775807,
                                "minimumAllocation": 0,
                                "name": "vcores",
                                "resourceType": "COUNTABLE",
                                "units": "",
                                "value": 40
                            }
                        ]
                    }
                },
                "usedResources": {
                    "memory": 45056,
                    "vCores": 32,
                    "resourceInformations": {
                        "resourceInformation": [
                            {
                                "maximumAllocation": 9223372036854775807,
                                "minimumAllocation": 0,
                                "name": "memory-mb",
                                "resourceType": "COUNTABLE",
                                "units": "Mi",
                                "value": 45056
                            },
                            {
                                "maximumAllocation": 9223372036854775807,
                                "minimumAllocation": 0,
                                "name": "vcores",
                                "resourceType": "COUNTABLE",
                                "units": "",
                                "value": 32
                            }
                        ]
                    }
                },
                "amUsedResources": {
                    "memory": 0,
                    "vCores": 0 
                    
                     
                    
                 
                    
                
