flink(入门)——wordcount批处理

复习一下,写一个简单的flink批处理小程序:

  1. 创建maven项目,引入依赖(注意引入 flink-clients_2.12,flink1.11后 flink-java 移除了这个依赖,需要手动添加,否则会报错 No ExecutorFactory found to execute the application
        <dependencies>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-java</artifactId>
                <version>1.12.2</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-streaming-java_2.12</artifactId>
                <version>1.12.2</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-clients_2.12</artifactId>
                <version>1.12.2</version>
            </dependency>
        </dependencies>
    View Code
  2. 创建WordCount类
    1. 创建执行环境
    2. 从文件中读取数据(有界流 也就是批数据)
    3. 对数据进行处理、输出
      package com.jy.bjz.wc;
      
      import org.apache.flink.api.common.functions.FlatMapFunction;
      import org.apache.flink.api.java.DataSet;
      import org.apache.flink.api.java.ExecutionEnvironment;
      import org.apache.flink.api.java.operators.AggregateOperator;
      import org.apache.flink.api.java.operators.DataSource;
      import org.apache.flink.api.java.tuple.Tuple2;
      import org.apache.flink.util.Collector;
      
      /**
       * TODO
       *
       * @author baojiazhong
       * @since 2021/9/8 13:51
       */
      // 批处理
      public class WordCount {
          public static void main(String[] args) throws Exception {
              // 创建执行环境
              ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
              // 从文件读取数据
              String inputPath = "C:\\Users\\15868\\Desktop\\TL\\flink\\learn\\src\\main\\resources\\hello.txt";
              DataSet<String> inputData = env.readTextFile(inputPath);
              // 自定义实现flatmap方法
              DataSet<Tuple2<String, Integer>> resultDataSet = inputData.flatMap(new myFlatMapper())
                      .groupBy(0)
                      .sum(1);
              // 打印输出
              resultDataSet.print();
      
          }
      
          // 自定义 实现flatmapper方法
          public static class myFlatMapper implements FlatMapFunction<String, Tuple2<String, Integer>> {
      
              public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
                  String[] words = s.split(" ");
                  for (String word : words) {
                      collector.collect(new Tuple2<String, Integer>(word, 1));
                  }
              }
          }
      
      
      }
      View Code
  3. 总结
    1. flink中较多方法都是比较喜欢自定义去实现接口
    2. 使用的tuple代表参数个数,例如 tuple2 就有两个参数,tuple25 就有25个参数

 

posted @ 2021-09-08 14:29  墨梅青莲  阅读(255)  评论(0)    收藏  举报