hadoop1.2.1入门基础知识

以下hadoop相关东西均是在hadoop 1.2.1版本下  

  • 伪分布式下环境的配置

  服务器主机名为:hadoop-master,$HADOOP_HOME/conf目录下

  core-site.xml、hdfs-site.xml、mapred-site.xml、slaves、masters

  core-site.xml 配置NameNode的相关信息

 1 <?xml version="1.0"?>
 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3 
 4 <!-- Put site-specific property overrides in this file. -->
 5 
 6 <configuration>
 7 <property>
 8     <name>fs.default.name</name>
 9     <value>hdfs://hadoop-master:9000</value>
10 </property>
11 <property>
12     <name>hadoop.tmp.dir</name>
13     <value>/home/hadoop/hadooptmp</value>
14 </property>
15 </configuration>

 

hdfs-site.xml 配置HDFS的相关信息

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
<property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>
</configuration>

mapred-site.xml 配置JobTracker的信息

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
    <name>mapred.job.tracker</name>
    <value>hadoop-master:9001</value>
</property>
</configuration>

slaves:hadoop-master 配置SecondaryNameNode的信息,可以配置多行

masters:hadoop-master 配置DataNode、TaskTracker

 

  • 伪分布式下启动方式

hadoop 三种启动方式:

cd $HADOOP_HOME

第一种:

分别启动HDFS、MapReduce

启动

./start-dfs.sh  # Optinally upgrade or rollback dfs state Run this on master node. start namenode after datanodes, to minimize time namenode is up w/o data note: datanodes will log connection errors until namenode starts

./start-mapred.sh  # Start hadoop map reduce daemons.  Run this on master node.

停止

./stop-mapred.sh  # Stop hadoop map reduce daemons.  Run this on master node.

./stop-dfs.sh  # Stop hadoop DFS daemons.  Run this on master node.

直接调用的是NameNode实际是通过hadoop-daemon.sh启动

DataNode、SecondaryNameNode是通过hadoop-daemons.sh启动

第二种:

全部启动或全部停止

启动

./start-all.sh  #Start all hadoop daemons.  Run this on master node.只能是在主节点上运行

停止

./stop-all.sh  # Stop all hadoop daemons.  Run this on master node.只能是在主节点上运行

第三种:

单独启动每一个进程

启动顺序

NameNode、DataNode、SecondaryNameNode、JobTracker、TaskTracker

停止顺序

启动

JobTracker、TaskTracker、NameNode、DataNode、SecondaryNameNode

./hadoop-daemon.sh start NameNode

./hadoop-daemon.sh start DataNode

./hadoop-daemon.sh start SecondaryNameNode

./hadoop-daemon.sh start JobTracker

./hadoop-daemon.sh start TaskTracker

停止

./hadoop-daemon.sh stop JobTracker

./hadoop-daemon.sh stop TaskTracker

./hadoop-daemon.sh stop NameNode

./hadoop-daemon.sh stop DataNode

./hadoop-daemon.sh stop SecondaryNameNode

启动后相关监控页面

HDFS:http://hadoop-master:50070

MapReduce:http://hadoop-master:50030

ps:启动的时候会有"Warning: $HADOOP_HOME is deprecated"的输出,新增环境变量HADOOP_HOME_WARN_SUPPRESS=1即可解决,具体原因在hadoop-config.sh中

if [ "$HADOOP_HOME_WARN_SUPPRESS" = "" ] && [ "$HADOOP_HOME" != "" ]; then
  echo "Warning: \$HADOOP_HOME is deprecated." 1>&2
  echo 1>&2
fi

  启动相应的脚本时,可以通过参数--config 配置文件目录 来指定配置文件的方式启动,此法常在生产环境下使用

  • 日志命名规则

日志路径配置在$HADOOP_HOME/conf/hadoop-env.sh中

1 # Where log files are stored. $HADOOP_HOME/logs by default.
2 # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

日志分为两种log和out,log为log4j配置所输出,

hadoop-hadoop-jobtracker-hadoop-master.log

HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log

日志命名规则为:框架名-用户名-进程名-主机名.log 

 

    •  HADOOP命令使用
 1 Usage: hadoop [--config confdir] COMMAND
 2 where COMMAND is one of:
 3   namenode -format     format the DFS filesystem
 4   secondarynamenode    run the DFS secondary namenode
 5   namenode             run the DFS namenode
 6   datanode             run a DFS datanode
 7   dfsadmin             run a DFS admin client
 8   mradmin              run a Map-Reduce admin client
 9   fsck                 run a DFS filesystem checking utility
10   fs                   run a generic filesystem user client
11   balancer             run a cluster balancing utility
12   oiv                  apply the offline fsimage viewer to an fsimage
13   fetchdt              fetch a delegation token from the NameNode
14   jobtracker           run the MapReduce job Tracker node
15   pipes                run a Pipes job
16   tasktracker          run a MapReduce task Tracker node
17   historyserver        run job history servers as a standalone daemon
18   job                  manipulate MapReduce jobs
19   queue                get information regarding JobQueues
20   version              print the version
21   jar <jar>            run a jar file
22   distcp <srcurl> <desturl> copy file or directories recursively
23   distcp2 <srcurl> <desturl> DistCp version 2
24   archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
25   classpath            prints the class path needed to get the
26                        Hadoop jar and the required libraries
27   daemonlog            get/set the log level for each daemon
28  or
29   CLASSNAME            run the class named CLASSNAME
30 Most commands print help when invoked w/o parameters.

如显示当前服务器hdfs目录下的文件

hadoop fs -ls hdfs://hadoop-master:9000/

    •  统计单词数
1 hadoop fs -mkdir /home/hadoop/data/wc/input
2 hadoop fs -put /opt/software/hadoop-1.2.1/conf/*.xml /home/hadoop/data/wc/input
3 hadoop jar hadoop-examples-1.2.1.jar wordcount /home/hadoop/data/wc/input /home/hadoop/data/wc/output

 

posted @ 2016-12-28 21:18  今夜通宵  阅读(128)  评论(0)    收藏  举报