hadoop1.2.1入门基础知识
以下hadoop相关东西均是在hadoop 1.2.1版本下
- 伪分布式下环境的配置
服务器主机名为:hadoop-master,$HADOOP_HOME/conf目录下
core-site.xml、hdfs-site.xml、mapred-site.xml、slaves、masters
core-site.xml 配置NameNode的相关信息
1 <?xml version="1.0"?> 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 3 4 <!-- Put site-specific property overrides in this file. --> 5 6 <configuration> 7 <property> 8 <name>fs.default.name</name> 9 <value>hdfs://hadoop-master:9000</value> 10 </property> 11 <property> 12 <name>hadoop.tmp.dir</name> 13 <value>/home/hadoop/hadooptmp</value> 14 </property> 15 </configuration>
hdfs-site.xml 配置HDFS的相关信息
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
mapred-site.xml 配置JobTracker的信息
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>hadoop-master:9001</value> </property> </configuration>
slaves:hadoop-master 配置SecondaryNameNode的信息,可以配置多行
masters:hadoop-master 配置DataNode、TaskTracker
- 伪分布式下启动方式
hadoop 三种启动方式:
cd $HADOOP_HOME
第一种:
分别启动HDFS、MapReduce
启动
./start-dfs.sh # Optinally upgrade or rollback dfs state Run this on master node. start namenode after datanodes, to minimize time namenode is up w/o data note: datanodes will log connection errors until namenode starts
./start-mapred.sh # Start hadoop map reduce daemons. Run this on master node.
停止
./stop-mapred.sh # Stop hadoop map reduce daemons. Run this on master node.
./stop-dfs.sh # Stop hadoop DFS daemons. Run this on master node.
直接调用的是NameNode实际是通过hadoop-daemon.sh启动
DataNode、SecondaryNameNode是通过hadoop-daemons.sh启动
第二种:
全部启动或全部停止
启动
./start-all.sh #Start all hadoop daemons. Run this on master node.只能是在主节点上运行
停止
./stop-all.sh # Stop all hadoop daemons. Run this on master node.只能是在主节点上运行
第三种:
单独启动每一个进程
启动顺序
NameNode、DataNode、SecondaryNameNode、JobTracker、TaskTracker
停止顺序
启动
JobTracker、TaskTracker、NameNode、DataNode、SecondaryNameNode
./hadoop-daemon.sh start NameNode
./hadoop-daemon.sh start DataNode
./hadoop-daemon.sh start SecondaryNameNode
./hadoop-daemon.sh start JobTracker
./hadoop-daemon.sh start TaskTracker
停止
./hadoop-daemon.sh stop JobTracker
./hadoop-daemon.sh stop TaskTracker
./hadoop-daemon.sh stop NameNode
./hadoop-daemon.sh stop DataNode
./hadoop-daemon.sh stop SecondaryNameNode
启动后相关监控页面
HDFS:http://hadoop-master:50070
MapReduce:http://hadoop-master:50030
ps:启动的时候会有"Warning: $HADOOP_HOME is deprecated"的输出,新增环境变量HADOOP_HOME_WARN_SUPPRESS=1即可解决,具体原因在hadoop-config.sh中
if [ "$HADOOP_HOME_WARN_SUPPRESS" = "" ] && [ "$HADOOP_HOME" != "" ]; then echo "Warning: \$HADOOP_HOME is deprecated." 1>&2 echo 1>&2 fi
启动相应的脚本时,可以通过参数--config 配置文件目录 来指定配置文件的方式启动,此法常在生产环境下使用
- 日志命名规则
日志路径配置在$HADOOP_HOME/conf/hadoop-env.sh中
1 # Where log files are stored. $HADOOP_HOME/logs by default. 2 # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
日志分为两种log和out,log为log4j配置所输出,
hadoop-hadoop-jobtracker-hadoop-master.log
HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log
日志命名规则为:框架名-用户名-进程名-主机名.log
-
- HADOOP命令使用
1 Usage: hadoop [--config confdir] COMMAND 2 where COMMAND is one of: 3 namenode -format format the DFS filesystem 4 secondarynamenode run the DFS secondary namenode 5 namenode run the DFS namenode 6 datanode run a DFS datanode 7 dfsadmin run a DFS admin client 8 mradmin run a Map-Reduce admin client 9 fsck run a DFS filesystem checking utility 10 fs run a generic filesystem user client 11 balancer run a cluster balancing utility 12 oiv apply the offline fsimage viewer to an fsimage 13 fetchdt fetch a delegation token from the NameNode 14 jobtracker run the MapReduce job Tracker node 15 pipes run a Pipes job 16 tasktracker run a MapReduce task Tracker node 17 historyserver run job history servers as a standalone daemon 18 job manipulate MapReduce jobs 19 queue get information regarding JobQueues 20 version print the version 21 jar <jar> run a jar file 22 distcp <srcurl> <desturl> copy file or directories recursively 23 distcp2 <srcurl> <desturl> DistCp version 2 24 archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive 25 classpath prints the class path needed to get the 26 Hadoop jar and the required libraries 27 daemonlog get/set the log level for each daemon 28 or 29 CLASSNAME run the class named CLASSNAME 30 Most commands print help when invoked w/o parameters.
如显示当前服务器hdfs目录下的文件
hadoop fs -ls hdfs://hadoop-master:9000/
-
- 统计单词数
1 hadoop fs -mkdir /home/hadoop/data/wc/input 2 hadoop fs -put /opt/software/hadoop-1.2.1/conf/*.xml /home/hadoop/data/wc/input 3 hadoop jar hadoop-examples-1.2.1.jar wordcount /home/hadoop/data/wc/input /home/hadoop/data/wc/output
浙公网安备 33010602011771号