提交流量统计案例到YARN上运行
1、总运行步骤
(0)提交自己开发的MR作业到YARN上运行的步骤
(1)打包:mvn clean package -DskipTests
(2)把编译出来的jar包(项目根目录/target/...jar)以及测试数据上传到服务器
scp xxxx hadoop@hostname:directory
(3)把数据上传到HDFS
hadoop fs -put xxx hdfspath
(4)执行作业
hadoop jar xxx.jar 完整的类名(包名+类名) args.....
(5)到YARN UI(8088) 上去观察作业的运行情况
(6)到输出目录去查看对应的输出结果
2、AccessYARNApp.java
(1)复制AccessLocalApp.java,为AccessYARNApp.java
(2)修改Path
//设置Job对应的参数:Mapper输出key和value的类型:作业输入和输出的路径 FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1]));
3、在本地cmd命令行中进行打包
(1)配置mvn环境变量
下载mvn:http://maven.apache.org/download.cgi
环境变量:
变量名1:maven_home
变量值1:D:\apache-maven-3.8.1-bin
变量名2:Path
变量值2:%maven_home%\bin;
(2)在cmd上操作
C:\Users\jieqiong>cd C:\Users\jieqiong\IdeaProjects\hadoop-train-v2 C:\Users\jieqiong\IdeaProjects\hadoop-train-v2>mvn clean package -DskipTests
操作成功后,显示打包地址:
[INFO] Building jar: C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target\hadoop-train-v2-1.0.jar
将本地已打包好的文件hadoop-train-v2-1.0.jar传至服务器(此处虚拟机中已上传好hadoop-train-v2-1.0.jar版本,不需要额外上传)
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target>scp hadoop-train-v2-1.0.jar hadoop@192.168.131.101:~/lib/
[hadoop@hadoop000 ~]$ cd lib/
[hadoop@hadoop000 lib]$ ls
hadoop-train-v2-1.0.jar qqwry.dat
将数据access.log上传至虚拟机服务器中的data文件中
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\access\input>scp access.log hadoop@192.168.131.101:~/data/
4、将数据access.log放置到HDFS上
HDFS上已有access.log
[hadoop@hadoop000 data]$ hadoop fs -mkdir -p /access/input
[hadoop@hadoop000 data]$ hadoop fs -put access.log /access/input
5、运行
[hadoop@hadoop000 ~]$ cd lib/ [hadoop@hadoop000 lib]$ ls hadoop-train-v2-1.0.jar qqwry.dat [hadoop@hadoop000 lib]$ hadoop jar hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.access.AccessYARNApp /access/input/access.log /access/output/
6、查看结果
[hadoop@hadoop000 ~]$ hadoop fs -ls /access/output Found 4 items -rw-r--r-- 1 hadoop supergroup 0 2021-07-21 00:11 /access/output/_SUCCESS -rw-r--r-- 1 hadoop supergroup 393 2021-07-21 00:11 /access/output/part-r-00000 -rw-r--r-- 1 hadoop supergroup 80 2021-07-21 00:11 /access/output/part-r-00001 -rw-r--r-- 1 hadoop supergroup 79 2021-07-21 00:11 /access/output/part-r-00002 [hadoop@hadoop000 ~]$ hadoop fs -text /access/output/part-r-00002
7、

浙公网安备 33010602011771号