浙江省高等学校教师教育理论培训

微信搜索“教师资格证岗前培训”小程序

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

星空下: 编译/部署hadoop 0.23

编译/部署hadoop 0.23

  •  进入release-0.23.0-rc0,查看INSTALL.TXT,提示编译hadoop0.23的前提:  
  1.  
    1.   * Unix System 
    2.   * JDK 1.6 
    3.   * Maven 3.0 
    4.   * Forrest 0.8 (if generating docs) 
    5.   * Findbugs 1.3.9 (if running findbugs) 
    6.   * ProtocolBuffer 2.4.1+ (for MapReduce) 
    7. * Autotools (if compiling native code) 
    8. * Internet connection for first build (to fetch all Maven and Hadoop dependencies)
  • JDK是必须的,安装配置JDK,maven3.0并配置PATH变量
  • 安装ProtocolBuffer
  • 使用以下命令编译: 
    • mvn clean install -DskipTests 
    • cd hadoop-mapreduce-project 
    • mvn clean install assembly:assembly -Pnative 

 ----------------------------

[或者直接下载编译好的hadoop版本,以上步骤省略,直接从配置环境变量开始配置]
下载地址:
http://mirror.bjtu.edu.cn/apache/hadoop/common/hadoop-0.23.0/hadoop-0.23.0.tar.gz
下载后解压 tar -zxvf hadoop-0.23.0.tar.gz
 ----------------------------
  •  
  • 配置环境变量(使用export)
    • $HADOOP_COMMON_HOME (指向common目录)
    • $HADOOP_MAPRED_HOME   (指向mr目录)
    • $YARN_HOME(与HADOOP_MAPRED_HOME相同)
    • $HADOOP_HDFS_HOME         (指向HDFS目录)
    • $YARN_HOME 
    • $JAVA_HOME 
    • $HADOOP_CONF_DIR (指向conf目录)
    • $YARN_CONF_DIR(与$HADOOP_CONF_DIR 相同)
  • 配置/编写mapred-site.xml
<property>
     <name>mapreduce.cluster.temp.dir</name>
     <value></value>
     <description>No description</description>
     <final>true</final>
   </property>

   <property>
     <name>mapreduce.cluster.local.dir</name>
     <value></value>
     <description>No description</description>
     <final>true</final>
   </property>
  • 配置/编写yarn-site.xml
其中的host换成你机器上hostname的输出值,port为端口号,自己定义,不能重复
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>host:port</value>
        <description>host is the hostname of the resource manager and
        port is the port on which the NodeManagers contact the Resource Manager.
        </description>
       </property>
    
       <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>host:port</value>
        <description>host is the hostname of the resourcemanager and port is the port
        on which the Applications in the cluster talk to the Resource Manager.
        </description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
        <description>In case you do not want to use the default scheduler</description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.address</name>
        <value>host:port</value>
        <description>the host is the hostname of the ResourceManager and the port is the port on
        which the clients can talk to the Resource Manager. </description>
      </property>
    
      <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value></value>
        <description>the local directories used by the nodemanager</description>
      </property>
    
      <property>
        <name>yarn.nodemanager.address</name>
        <value>0.0.0.0:port</value>
        <description>the nodemanagers bind to this port</description>
      </property>  
    
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>10240</value>
        <description>the amount of memory on the NodeManager in GB</description>
      </property>
    
      <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/app-logs</value>
        <description>directory on hdfs where the application logs are moved to </description>
      </property>
    
       <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value></value>
        <description>the directories used by Nodemanagers as log directories</description>
      </property>
    
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce.shuffle</value>
        <description>shuffle service that needs to be set for Map Reduce to run </description>
      </property>
    •  创建符号链接: 

    只需创建一次,下次执行不必创建

      $ cd $HADOOP_COMMON_HOME/share/hadoop/common/lib/
      $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-app-*-SNAPSHOT.jar .
      $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-jobclient-*-SNAPSHOT.jar .
      $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-common-*-SNAPSHOT.jar .
      $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-shuffle-*-SNAPSHOT.jar .
      $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-core-*-SNAPSHOT.jar .
      $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-yarn-common-*-SNAPSHOT.jar .
      $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-yarn-api-*-SNAPSHOT.jar . 
      •  启动resourcemanager和nodemanager

      如有问题,看logs下面的输出,可定位出错原因

        $ cd $HADOOP_MAPRED_HOME
        $ bin/yarn-daemon.sh start resourcemanager
        $ bin/yarn-daemon.sh start nodemanager 
        •  执行example中的例子:

        hadoop.apache.org上给的命令还是copy0.20版本的,注意example.jar的路径

            $HADOOP_COMMON_HOME/bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar  randomwriter out 



             看一下我的执行结果:

            2011-12-04 16:08:34,907 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(459)) - done with 102406 records.

            2011-12-04 16:08:34,907 INFO  mapred.Task (Task.java:sendDone(1008)) - Task 'attempt_local_0001_m_000000_0' done.

            2011-12-04 16:08:34,907 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(232)) - Finishing task: attempt_local_0001_m_000000_0

            2011-12-04 16:08:34,908 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(352)) - Map task executor complete.

            2011-12-04 16:08:35,762 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1227)) -  map 100% reduce 0%

            2011-12-04 16:08:35,763 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1238)) - Job job_local_0001 completed successfully

            2011-12-04 16:08:35,785 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1245)) - Counters: 20

                File System Counters

                    FILE: BYTES_READ=251516

                    FILE: BYTES_WRITTEN=1086056829

                    FILE: READ_OPS=0

                    FILE: LARGE_READ_OPS=0

                    FILE: WRITE_OPS=0

                org.apache.hadoop.mapreduce.TaskCounter

                    MAP_INPUT_RECORDS=1

                    MAP_OUTPUT_RECORDS=102406

                    SPLIT_RAW_BYTES=113

                    SPILLED_RECORDS=0

                    FAILED_SHUFFLE=0

                    MERGED_MAP_OUTPUTS=0

                    GC_TIME_MILLIS=0

                    CPU_MILLISECONDS=0

                    PHYSICAL_MEMORY_BYTES=0

                    VIRTUAL_MEMORY_BYTES=0

                    COMMITTED_HEAP_BYTES=62652416

                org.apache.hadoop.examples.RandomWriter$Counters

                    BYTES_WRITTEN=1073747349

                    RECORDS_WRITTEN=102406

                org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter

                    BYTES_READ=0

                org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter

                    BYTES_WRITTEN=1085705129

            Job ended: Sun Dec 04 16:08:35 CST 2011

            The job took 20 seconds.



            本文地址:http://nourlcn.ownlinux.net/2011/12/hadoop-023.html

            posted on 2012-03-10 10:18  lexus  阅读(539)  评论(0编辑  收藏  举报