MapReduce实例——查询缺失扑克牌

问题:

 解决:

首先分为两个过程,Map过程将<=10的牌去掉,然后只针对于>10的牌进行分类,Reduce过程,将Map传过来的键值对进行统计,然后计算出少于3张牌的的花色

 

1.代码

1) Map代码

1     String line = value.toString();
2     String[] strs = line.split("-");
3     if(strs.length == 2){
4         int number = Integer.valueOf(strs[1]);
5         if(number > 10){
6             context.write(new Text(strs[0]), value);
7         }
8     }

 

 

2) Reduce代码

1      Iterator<Text> iter = values.iterator();
2      int count = 0;
3      while(iter.hasNext()){
4         iter.next();
5         count ++;
6     }
7     if(count < 3){
8         context.write(key, NullWritable.get());
9     }

 

 

3) Runner代码

 1     Configuration conf = new Configuration();
 2     Job job = Job.getInstance(conf);
 3     job.setJobName("poker mr");
 4     job.setJarByClass(pokerRunner.class);
 5             
 6     job.setMapperClass(pakerMapper.class);
 7     job.setReducerClass(pakerRedue.class);
 8             
 9     job.setMapOutputKeyClass(Text.class);
10     job.setMapOutputValueClass(Text.class);
11             
12     job.setOutputKeyClass(Text.class);
13     job.setOutputValueClass(NullWriter.class);
14             
15     FileInputFormat.addInputPath(job, new Path(args[0]));
16     FileOutputFormat.setOutputPath(job, new Path(args[1]));
17             
18     job.waitForCompletion(true);

 

2.运行结果

File System Counters

      FILE: Number of bytes read=87

      FILE: Number of bytes written=211167

      FILE: Number of read operations=0

      FILE: Number of large read operations=0

      FILE: Number of write operations=0

      HDFS: Number of bytes read=366

      HDFS: Number of bytes written=6

      HDFS: Number of read operations=6

      HDFS: Number of large read operations=0

      HDFS: Number of write operations=2

   Job Counters

      Launched map tasks=1

      Launched reduce tasks=1

      Data-local map tasks=1

      Total time spent by all maps in occupied slots (ms)=109577

      Total time spent by all reduces in occupied slots (ms)=42668

      Total time spent by all map tasks (ms)=109577

      Total time spent by all reduce tasks (ms)=42668

      Total vcore-seconds taken by all map tasks=109577

      Total vcore-seconds taken by all reduce tasks=42668

      Total megabyte-seconds taken by all map tasks=112206848

      Total megabyte-seconds taken by all reduce tasks=43692032

   Map-Reduce Framework

      Map input records=49

      Map output records=9

      Map output bytes=63

      Map output materialized bytes=87

      Input split bytes=110

      Combine input records=0

      Combine output records=0

      Reduce input groups=4

      Reduce shuffle bytes=87

      Reduce input records=9

      Reduce output records=3

      Spilled Records=18

      Shuffled Maps =1

      Failed Shuffles=0

      Merged Map outputs=1

      GC time elapsed (ms)=992

      CPU time spent (ms)=3150

      Physical memory (bytes) snapshot=210063360

      Virtual memory (bytes) snapshot=652480512

      Total committed heap usage (bytes)=129871872

   Shuffle Errors

      BAD_ID=0

      CONNECTION=0

      IO_ERROR=0

      WRONG_LENGTH=0

      WRONG_MAP=0

      WRONG_REDUCE=0

   File Input Format Counters

      Bytes Read=256

   File Output Format Counters

      Bytes Written=6

3.运行方法

在Eclipse里编译好,生出jar包,然后上传到linux系统上,在集群上运行该文件

运行命令:bin/hadoop **.jar 类包名 /

例如:bin/hadoop **.jar com.test.mr /

 

posted @ 2017-03-24 17:26  calmLang  阅读(905)  评论(0编辑  收藏  举报