Using YARN with Cgroups testing in sparkml cluster
部署服务器:
sparkml 集群
########### sparkml ##########
sparkml-node1 # yarn resource manager
sparkml-node2 # nodemanager spark-2.0.0
sparkml-node3 # nodemanager spark-2.0.0
sparkml-node4 # nodemanager spark-2.0.0
sparkml-node5 # nodemanager spark-2.0.0
上线功能:
- Cgroup 限制每个节点 yarn container 能占用的该节点 CPU 总量
- 每个 yarn container 能够按照被分配的 vcore 数目 share CPU
测试方法:
功能一测试:
在不限制的情况下,我们跑一条 hive SQL
test_hive_sql.sql
我们看看 container 分配情况:

4 个 nodemanager 节点的 CPU 使用情况:




都接近 100 %
我们现在尝试限制到 50%
设置 cpu.cfs_quota_us="1200000"; (计算方法:24 (逻辑CPU核心数)* 0.5(50% CPU 使用)* 100000(每个计算周期) = 1200000)
重启 cgroup : /etc/init.d/cgconfig restart
再跑一次同样的 SQL :
基本同样的 container 分配

nodemanager 服务器上的 CPU 使用:




全部限制在 50% 以内
功能二,测试:
hive SQL 跑出来的 container 都只占用了 一个 vcore (mapred的特性?),因此我们用 spark 来进行测试:
我们跑这一段代码:
from __future__ import print_function # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # import sys from random import random from operator import add from pyspark import SparkContext import time if __name__ == "__main__": """ Usage: pi [partitions] """ sc = SparkContext(appName="PythonPi") partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2 n = 100000 * partitions def f(_): for i in range(1,10000): x = random() * random() * random() - 1 y = random() * random() * random() - 1 #time.sleep(60) x = random() * random() * random() - 1 y = random() * random() * random() - 1 return 1 if x ** 2 + y ** 2 < 1 else 0 count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add) print("Pi is roughly %f" % (4.0 * count / n)) sc.stop()
container 分配:

跑了 1 个 container 4 个 vcore 的服务器上面:

跑测试的 hive SQL

在 node4 这台服务器上:

spark_sc 的 CPU 占用只有 100,没有其他 vcore 为 1 的来自 hdfs 的 container 多
这是因为上述 python 代码没有并发,因此只能使用 一个 核

这台服务器上有 5 个 container :

只有 最后一个 container 的 cpu.shares 值是 4096 ,是别的 4 倍

上述结果和我们观察到的 vcore 分配一致,在这里 python code 的 CPU 占用没有 hive SQL 生成的 container 多是因为 python 使用了 单进程,没有多核调度
测试结果:
对于功能一:生效
对于功能二:生效,通过控制 cpu.shares 来按照 vcore 分配 CPU ,缺乏直观的测试数据
配置参数:
yarn.nodemanager.container-executor.class : org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
yarn.nodemanager.linux-container-executor.resources-handler.class : org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler
yarn.nodemanager.linux-container-executor.cgroups.hierarchy : /hadoop-yarn (对于 /cgroup/cpu/ 目录下的 cgroup hierarchy ,手动配置到 cgconfig.conf 文件里面)
yarn.nodemanager.linux-container-executor.cgroups.mount : true
yarn.nodemanager.linux-container-executor.cgroups.mount-path : /cgroup (cgroup 文件系统根目录)
yarn.nodemanager.linux-container-executor.group : yarn
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users : false
不生效参数:
yarn.nodemanager.resource.percentage-physical-cpu-limit : 100 (该参数控制 nodemanager 节点的总体CPU 使用,hadoop-2.5.0-cdh5.3.2 不支持,可以同在 在 cgconfig.conf 中配置 cpu.cfs_quota_us)
yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage : false (CPU use hard limit)
cgroup 配置:
# # Copyright IBM Corporation. 2007 # # Authors: Balbir Singh <balbir@linux.vnet.ibm.com> # This program is free software; you can redistribute it and/or modify it # under the terms of version 2.1 of the GNU Lesser General Public License # as published by the Free Software Foundation. # # This program is distributed in the hope that it would be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # See man cgconfig.conf for further details. # # By default, mount all controllers to /cgroup/<controller> mount { cpuset = /cgroup/cpuset; cpu = /cgroup/cpu; cpuacct = /cgroup/cpuacct; memory = /cgroup/memory; devices = /cgroup/devices; freezer = /cgroup/freezer; net_cls = /cgroup/net_cls; blkio = /cgroup/blkio; } group hadoop-yarn { perm { task { uid = yarn; gid = hadoop; } admin { uid = yarn; gid = hadoop; } } cpu { # cpu.shares="1024"; # cpu.cfs_period_us="100000"; # cpu.cfs_quota_us="1200000"; } }
原理简述:
cgroup 通过 cgroup hierarchy 来将 subsystem 和 task 联系起来,每次 yarn 在启动 container 的时候都会将在指定的 hadoop-yarn cgroup hierarchy 下面新建属于每个 container 的 hierarchy

开始跑 container 以后

由于总体的 节点 CPU 限制在线上版本不支持(YarnConfiguration.java 里面没有读入 yarn.nodemanager.resource.percentage-physical-cpu-limit 参数,也没有在 CgroupsLCEResourcesHandler 有相关实现,具体实现参考 : YARN-2440)
我们在 hadoop-yarn 里面配置 设置 cpu.cfs_quota_us ,在 hadoop-yarn 下属的所有 container cgroup hierarchy 都不能超过父 hierarchy 的限制
对于功能二:
通过 YARN-600 加入到 CgroupsLCEResourcesHandler 类
if (isCpuWeightEnabled()) { createCgroup(CONTROLLER_CPU, containerName); int cpuShares = CPU_DEFAULT_WEIGHT * containerResource.getVirtualCores(); // absolute minimum of 10 shares for zero CPU containers cpuShares = Math.max(cpuShares, 10); updateCgroup(CONTROLLER_CPU, containerName, "shares", String.valueOf(cpuShares)); }
cpuShares 最少值 为 10 ,按照 VirtualCores 给予每个 container 相应的 cpu.shares 值
Linux cfs 调度器会根据 cpu.shares 值作用到 CPU 调度,具体参考:cpu.shares 作用原理
部署流程:
yarn-site.xml
<property> <name>yarn.nodemanager.container-executor.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.resources-handler.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name> <value>/hadoop-yarn</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.mount</name> <value>true</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name> <value>/cgroup</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.group</name> <value>yarn</value> </property> <property> <name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name> <value>100</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name> <value>false</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name> <value>false</value> </property>
部署 cgroup
重新编译 container-executor :
cd ${HADOOP_HOME}/hadoop-2.6.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/
cmake src -DHADOOP_CONF_DIR=/etc/hadoop
make
cd targe/usr/local/bin/即可获得需要的container-executor文件
配置 container-executor.cfg
yarn.nodemanager.linux-container-executor.group=yarn banned.users=bin min.user.id=0 allowed.system.users=hdfs,yarn
启动 cgroup
重启 yarn
参考文献:
Using YARN with Cgroups 参数配置 Apache 官网
后续跟进:
调查 yarn 是否支持灰度上 cgroup
我们使用在外围不停 cgclassify 来上 cgroup
#!/bin/bash echo "" echo "" containerPid=` su - yarn -c ' jps | grep -v NodeManager | grep -v -i jps ' | awk '{print $1}' ` containerList=` su - yarn -c ' jps | grep -v NodeManager | grep -v -i jps ' ` echo " We will begin to move ${containerList} of yarn to cgroup " for pid in ${containerPid} do cgclassify -g cpu:hadoop-yarn $pid done echo " Move to cgroup per minute done " taskID=` cat /cgroup/cpu/hadoop-yarn/tasks ` echo " Content in hadoop-yarn hierarchy is : ${taskID} " date echo "" echo ""
部署 crontab job 一分钟一次,看效果
待续
浙公网安备 33010602011771号