吹静静

欢迎QQ交流:592590682

报错背景

Linux 环境,普通用户。

Flink任务提交到Yarn集群上执行,发现Yarn能够成功分配资源,但是任务始终处于ACCEPTE状态,不能执行,查看Yarn日志后发现几乎没有报错日志,任务等待一定时间之后直接退出,并没有报告明显错误。

报错现象

查看Yarn WEB界面:http://bigdata1:8088/cluster/app/application_xxx,发现以下信息

Application Attempt State:FAILED

Started:Fri Oct 14 09:28:16 +0800 2022
Elapsed:3sec
AM Container:container_1665709915900_0003_02_000001
Node:N/A
Tracking URL:History
Diagnostics Info:AM Container for appattempt_1665709915900_0003_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2022-10-14 09:28:19.937]Exception from container-launch.Container
id: container_1665709915900_0003_02_000001Exit code: 1[2022-10-14 09:28:19.939]Container exited with a non-zero exit code 1.
Error file: prelaunch.err.Last 4096 bytes of prelaunch.err :[2022-10-14 09:28:19.940]Container exited with a non-zero exit code 1.
Error file: prelaunch.err.Last 4096 bytes of prelaunch.err :For more detailed output, check the application tracking
page: http://bigdata:8088/cluster/app/application_1665709915900_0003 Then click on links to logs of each attempt.

报错原因

环境变量的原因。

之前测试环境权限都是放开的,部署的Hadoop正常启动是没问题的,但是这次用的普通用户,权限的限制非常苛刻,猜测是外部用户权限限制导致hadoop在运行过程中内部调用时找不到相关环境变量。

报错解决

在yarn-site.xml中添加环境变量

<property>
        <name>yarn.application.classpath</name>
        <value>/opt/app/hadoop/etc/hadoop:/opt/app/hadoop/share/hadoop/common/lib/*:/opt/app/hadoop/share/hadoop/common/*:/opt/app/hadoop/share/hadoop/hdfs:/opt/app/hadoop/share/hadoop/hdfs/lib/*:/opt/app/hadoop/share/hadoop/hdfs/*:/opt/app/hadoop/share/hadoop/yarn/lib/*:/opt/app/hadoop/share/hadoop/yarn/*:/opt/app/hadoop/share/hadoop/mapreduce/lib/*:/opt/app/hadoop/share/hadoop/mapreduce/*</value>
</property>

重启Hadoop即可。

 

posted on 2022-10-14 17:01  吹静静  阅读(1439)  评论(0编辑  收藏  举报