hadoop2.7.x运行wordcount程序卡住在INFO mapreduce.Job: Running job:job _1469603958907_0002

一、抛出问题  

  Hadoop集群(全分布式)配置好后,运行wordcount程序测试,发现每次运行都会卡住在Running job处,然后程序就呈现出卡死的状态。

  wordcount运行命令:[hadoop@master hadoop-2.7.2]$ /opt/module/hadoop-2.7.2/bin/hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar  wordcount  /wc/mytemp/123 /wc/mytemp/output

  现象截图如下:卡死在红线部分:

      

二、解决方法

  1、因为小白一枚,到网上找了很多教程,集中说法如下:

    (1)有的说,是防火墙或者selinux没关闭,然后,就去一一查看,发现全部关闭

    (2)有的说,是因为/etc/hosts文件中的127.0.0.1等多余的ip地址没删除或者没注释调

    (3)有的人说,查看日志(what?小白哪知道哪个日志),然后不了了之。

  2、解决办法:  

  小白解决问题总是会花费很多时间的,因此半天就这样没了,很对不起公司的工资啊,现将解决办法一一列出。

  (1)第一步:因为Running job发生的问题,在hadoop 中我们要想到mapreduce发生的问题,在Hadoop2.x系列中MapReduce是通过yarn进行管理的,因此我们查看yarn-hadoop-nodemanager-slave01.log 日志,该日志在slave节点的¥{HADOOP_HOME}/logs下面

终端执行shell指令:yarn-hadoop-nodemanager-slave01.log

查看到日志截图如下:

2016-07-27 03:30:51,041 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:52,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:53,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:54,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:55,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:56,050 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:31:27,053 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

(2)大概的解释一下意思

  就是说每次Client试图连接0.0.0.0/0.0.0.0:8031失败,那么导致这个原因,应该能想到是配置问题,然后复制这段信息进行百度,尝试了几个,终于参考了此博客(解决Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is... )解决了本文的问题,将下述代码添加到yare-site.xml中:(注意我将master、slave01、slave02这个文件都修改了,是不是只修改master就可以,不清楚,但是初步判断应该全部修改

  

<property>  
    <name>yarn.resourcemanager.address</name>  
    <value>master:8032</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.scheduler.address</name>  
    <value>master:8030</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.resource-tracker.address</name>  
    <value>master:8031</value>  
  </property> 

 

然后插入后的效果如图:

 

(3)问题解决

再次运行wordcount程序成功:

[hadoop@master hadoop-2.7.2]$ /opt/module/hadoop-2.7.2/bin/hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar  wordcount  /wc/mytemp/123 /wc/mytemp/output
16/07/27 03:33:29 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.95.100:8032
16/07/27 03:33:31 INFO input.FileInputFormat: Total input paths to process : 1
16/07/27 03:33:31 INFO mapreduce.JobSubmitter: number of splits:1
16/07/27 03:33:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469604761767_0001
16/07/27 03:33:32 INFO impl.YarnClientImpl: Submitted application application_1469604761767_0001
16/07/27 03:33:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1469604761767_0001/
16/07/27 03:33:32 INFO mapreduce.Job: Running job: job_1469604761767_0001
16/07/27 03:33:47 INFO mapreduce.Job: Job job_1469604761767_0001 running in uber mode : false
16/07/27 03:33:47 INFO mapreduce.Job:  map 0% reduce 0%
16/07/27 03:33:55 INFO mapreduce.Job:  map 100% reduce 0%
16/07/27 03:34:08 INFO mapreduce.Job:  map 100% reduce 100%
16/07/27 03:34:08 INFO mapreduce.Job: Job job_1469604761767_0001 completed successfully
16/07/27 03:34:08 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=1291
                FILE: Number of bytes written=237185
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1498
                HDFS: Number of bytes written=1035
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6738
                Total time spent by all reduces in occupied slots (ms)=9139
                Total time spent by all map tasks (ms)=6738
                Total time spent by all reduce tasks (ms)=9139
                Total vcore-milliseconds taken by all map tasks=6738

用如下命令可以查看统计结果:

 

 

posted @ 2016-07-27 16:09  YouxiBug  阅读(13460)  评论(2编辑  收藏  举报