hadoop学习笔记(一)
hadoop生态系统:
//jdk ,hadoop 压缩包存放目录 /home/softwares
//软件安装目录 /opt/modules
export JAVA_HOME=/opt/modules/jdk1.8.0_201
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JRE_HOME=$JAVA_HOME/jre
/hadoop-2.5.0 /etc/hadoop/hadoop-env.sh 配置java环境变量
本机模式(standalone)
mapreduce 程序运行在本地。
1.安装目录创建input文件夹
mkdir input
2.拷贝/etc/hadoop/*.xml到input文件夹
cp etc/hadoop/*.xml input/
3.执行example程序
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-xx.xx.xx.jar grep input output 'dfs[a-z.]+'
4.执行wordcount示例程序
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount wcinput wcoutput
伪分布式模式
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop.fengyue.com:8280</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/hadoop-2.5.0/data/tmp</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name> //hdfs文件备份的数量
<value>1</value>
</property>
</configuration>
编辑如上2个xml文件。
2.hdfs文件系统格式化
bin/hdfs namenode -format
3.->/sbin
sbin/hadoop-daemon.sh start namenode //启动namenode元数据
sbin/hadoop-daemon.sh start datanode //启动datanode数据块
jps //查看进程
hostname:50070 //web端,查看hdfs
//创建一个文件系统
bin/hdfs dfs -mkdir -p /user./fengyue/
//查看hdfs目录结构
bin/hdfs dfs -ls -R /
//创建hdfs文件目录
bin/hdfs dfs -mkdir -p /user/fengyue/mapreduce/wordcount/input
//上传文件到指定目录
bin/hdfs dfs -put /XXX/XXX /user/fengyue/mapreduce/wordcount/input
//查看hdfs文件系统的文件
bin/hdfs dfs -cat /user/fengyue/mapreduce/wordcount/input/wc.input
//运行hdfs文件系统中mapreduce文件
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /user/fengyue/mapreduce/wordcount/input /user/fengyue/mapreduce/wordcount/output
配置单节点yarn
yarn 上运行 resoucemanager nodemanager
1.yarn-env.sh 配置JAVA_HOME
2.slave 配置
3.yarn-site.xml 配置
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop.fengyue.com</value> //hostname
</property>
启动yarn
sbin/yarn-daemon.sh start resourcemanager
sbin/yarn-daemon.sh start nodemanager
jps查看服务
yarn web页面 默认端口8088
配置mapreduce的运行环境为yarn:编辑 etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
配置mapred-env.sh JAVA_HOME
//删除hdfs 文件目录
bin/hdfs dfs -rm -R /user/fengyue/mapreduce/wordcount/output/
回顾:
//启动历史服务器
sbin/mr-jobhistory-daemon.sh start historyserver
//停止历史服务器
sbin/mr-jobhistory-daemon.sh stop historyserver
//日志聚集
yarn-site.xml
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>640800</value>
</property>
//配置文件
默认配置:
core-default.xml
hdfs-default.xml
mapred-default.xml
yarn-default.xml
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
//文件删除 回收站保存被删除文件的时间
core-site.xml
fs.trash.interval
//启动方式
1.各个服务组件逐一启动
*hdfs
hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode
*yarn
yarn-daemon.sh start|stop resourcemanager|nodemanager
*mapreduce
mr-historyserver-daemon.sh start|stop historyserver
2.模块启动
*hdfs
start-dfs.sh
stop-dfs.sh
*yarn
start-yarn.sh
stop-yarn.sh
3.全部启动
start-all.sh
stop-all.sh
//各模块配置文件详解
---HDFS---
NameNode:core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop.fengyue.com:8280</value>
</property>
DataNode: salves
SecondaryNameNode : hdfs.xml
<property> //http
<name> dfs.namenode.secondary.http-address </name>
<value> hadoop.fengyue.com:50090</value>
</property>
<property> //https
<name> dfs.namenode.secondary.https-address </name>
<value> hadoop.fengyue.com:50090</value>
</property>
---YARN---
ResourceManager : yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop.fengyue.com</value>
</property>
NodeManager : salves
----MapReduce----
historyServer:mapred-default.xml
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop.fengyue.com:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop.fengyue.com:19888</value>
</property>

浙公网安备 33010602011771号