flink(入门)——wordcount批处理
复习一下,写一个简单的flink批处理小程序:
- 创建maven项目,引入依赖(注意引入 flink-clients_2.12,flink1.11后 flink-java 移除了这个依赖,需要手动添加,否则会报错 No ExecutorFactory found to execute the application)
View Code<dependencies> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.12.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.12</artifactId> <version>1.12.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.12</artifactId> <version>1.12.2</version> </dependency> </dependencies> - 创建WordCount类
- 创建执行环境
- 从文件中读取数据(有界流 也就是批数据)
- 对数据进行处理、输出
View Codepackage com.jy.bjz.wc; import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.java.DataSet; import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.api.java.operators.AggregateOperator; import org.apache.flink.api.java.operators.DataSource; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.util.Collector; /** * TODO * * @author baojiazhong * @since 2021/9/8 13:51 */ // 批处理 public class WordCount { public static void main(String[] args) throws Exception { // 创建执行环境 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // 从文件读取数据 String inputPath = "C:\\Users\\15868\\Desktop\\TL\\flink\\learn\\src\\main\\resources\\hello.txt"; DataSet<String> inputData = env.readTextFile(inputPath); // 自定义实现flatmap方法 DataSet<Tuple2<String, Integer>> resultDataSet = inputData.flatMap(new myFlatMapper()) .groupBy(0) .sum(1); // 打印输出 resultDataSet.print(); } // 自定义 实现flatmapper方法 public static class myFlatMapper implements FlatMapFunction<String, Tuple2<String, Integer>> { public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception { String[] words = s.split(" "); for (String word : words) { collector.collect(new Tuple2<String, Integer>(word, 1)); } } } }
- 总结
- flink中较多方法都是比较喜欢自定义去实现接口
- 使用的tuple代表参数个数,例如 tuple2 就有两个参数,tuple25 就有25个参数

浙公网安备 33010602011771号