提交流量统计案例到YARN上运行

1、总运行步骤

(0)提交自己开发的MR作业到YARN上运行的步骤

(1)打包:mvn clean package -DskipTests

(2)把编译出来的jar包(项目根目录/target/...jar)以及测试数据上传到服务器

    scp xxxx hadoop@hostname:directory

(3)把数据上传到HDFS

    hadoop fs -put xxx hdfspath

(4)执行作业

    hadoop jar xxx.jar 完整的类名(包名+类名) args.....

(5)到YARN UI(8088) 上去观察作业的运行情况

(6)到输出目录去查看对应的输出结果

 

2、AccessYARNApp.java

(1)复制AccessLocalApp.java,为AccessYARNApp.java

(2)修改Path

        //设置Job对应的参数:Mapper输出key和value的类型:作业输入和输出的路径
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

 

3、在本地cmd命令行中进行打包

(1)配置mvn环境变量

    下载mvn:http://maven.apache.org/download.cgi

    环境变量:

    变量名1:maven_home

    变量值1:D:\apache-maven-3.8.1-bin

    变量名2:Path

    变量值2:%maven_home%\bin;

(2)在cmd上操作

C:\Users\jieqiong>cd C:\Users\jieqiong\IdeaProjects\hadoop-train-v2
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2>mvn clean package -DskipTests

   操作成功后,显示打包地址:

[INFO] Building jar: C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target\hadoop-train-v2-1.0.jar

  将本地已打包好的文件hadoop-train-v2-1.0.jar传至服务器(此处虚拟机中已上传好hadoop-train-v2-1.0.jar版本,不需要额外上传)

C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target>scp hadoop-train-v2-1.0.jar hadoop@192.168.131.101:~/lib/
[hadoop@hadoop000 ~]$ cd lib/
[hadoop@hadoop000 lib]$ ls
hadoop-train-v2-1.0.jar  qqwry.dat

  将数据access.log上传至虚拟机服务器中的data文件中

C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\access\input>scp access.log hadoop@192.168.131.101:~/data/

 

4、将数据access.log放置到HDFS上

  HDFS上已有access.log

[hadoop@hadoop000 data]$ hadoop fs -mkdir -p /access/input
[hadoop@hadoop000 data]$ hadoop fs -put access.log /access/input

 

5、运行

[hadoop@hadoop000 ~]$ cd lib/
[hadoop@hadoop000 lib]$ ls
hadoop-train-v2-1.0.jar  qqwry.dat
[hadoop@hadoop000 lib]$ hadoop jar hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.access.AccessYARNApp /access/input/access.log /access/output/

 

6、查看结果

[hadoop@hadoop000 ~]$ hadoop fs -ls /access/output
Found 4 items
-rw-r--r--   1 hadoop supergroup          0 2021-07-21 00:11 /access/output/_SUCCESS
-rw-r--r--   1 hadoop supergroup        393 2021-07-21 00:11 /access/output/part-r-00000
-rw-r--r--   1 hadoop supergroup         80 2021-07-21 00:11 /access/output/part-r-00001
-rw-r--r--   1 hadoop supergroup         79 2021-07-21 00:11 /access/output/part-r-00002
[hadoop@hadoop000 ~]$ hadoop fs -text /access/output/part-r-00002

 

7、

 

posted @ 2021-07-20 16:22  酱汁怪兽  阅读(77)  评论(0)    收藏  举报