Spark on Yarn配置
1、Spark on Yarn配置
1.)在搭建好的Spark上修改spark-env.sh文件:
# vim $SPARK_HOME/conf/spark-env.sh
添加以下配置:
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
2.)在搭建好的Spark上修改spark-defaults.conf文件
# vim $SPARK_HOME/conf/spark-defaults.conf
添加以下配置:
spark.master yarn
yarn的capacity-scheduler.xml文件修改配置保证资源调度按照CPU + 内存模式:
<property> <name>yarn.scheduler.capacity.resource-calculator</name> <!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> --> <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value> </property>
2、Spark on Yarn日志配置
在yarn-site.xml开启日志功能: <property> <description>Whether to enable log aggregation</description> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log.server.url</name> <value>http://master:19888/jobhistory/logs</value> </property> 修改mapred-site.xml: <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property>
修改spakr-defaults.conf文件:
spark.eventLog.dir=hdfs://bda1node01:8020/user/spark/applicationHistory spark.eventLog.enabled=true spark.yarn.historyServer.address=http://bda1node01:18018
修改spark-evn.sh环境变量:
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18018 -Dspark.history.fs.logDirectory=hdfs://bda1node01:8020/user/spark/applicationHistory"
yarn查看日志命令: yarn logs -applicationId <application_1590546538590_0017>
启动Hadoop和Spark历史服务器:
# mapred --daemon start historyserver # $SPARK_HOME/sbin/start-history-server.sh