一、初始化java工程
mvn archetype:generate -DarchetypeGroupId=org.apache.flink -DarchetypeArtifactId=flink-quickstart-java -DarchetypeVersion=1.9.1
二、DataFlow编程模型

三、WordCount例子
public class StreamWordCount { public static void main(String[] args) throws Exception { //创建一个flink stream程序的执行环境 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); //使用StreamExecutionEnvironment创建DataStream //Source DataStream<String> lines = env.socketTextStream("localhost", 8888); //Transformation开始 //调用DataStream上的方法Transformation(s) SingleOutputStreamOperator<String> words = lines.flatMap(new FlatMapFunction<String, String>() { @Override public void flatMap(String line, Collector<String> out) throws Exception { //切分 String[] words = line.split(" "); for(String word : words) { //输出 out.collect(word); } } }); //将单词和一组合 SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = words.map(new MapFunction<String, Tuple2<String, Integer>>() { @Override public Tuple2<String, Integer> map(String word) throws Exception { return Tuple2.of(word, 1); } }); SingleOutputStreamOperator<Tuple2<String, Integer>> sumned = wordAndOne.keyBy(0).sum(1); //Transformation结束 //调用Sink(Sink必须调用) sumned.print(); //启动 env.execute("StreamWordCount"); } }
启动nc -lk 8888
spring^H^H flink flink
spark hive flink
运行结果:
2> (spring,1) 4> (flink,1) 4> (flink,2) 1> (spark,1) 1> (hive,1) 4> (flink,3)
要点:1、前面的序号代表task编号,相同的单词进入相同的task编号里
2、本地跑,task编号数与电脑的cpu核数有关,详情见StreamExecutionEnvironment.getExecutionEnvironment源码
四、WordCount优化
上述代码flagmap和map数据转换方法可以合为如下一个方法
//优化: SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = lines.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() { @Override public void flatMap(String line, Collector<Tuple2<String, Integer>> out) throws Exception { String[] words = line.split(" "); for(String word : words) { Tuple2<String, Integer> tp = Tuple2.of(word, 1); out.collect(tp); } } });