一、滚动窗口使用Eventime

        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        //timestamp,flink,2
        //timestamp,sprak,3
        DataStream<String> lines = env.socketTextStream("192.168.87.130", 8888)//仅仅提取时间字段,不会改变数据的样式
                .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<String>(Time.seconds(0)) {
                    private static final long serialVersionUID = -4441231666252017557L;
                    //将数据中的时间字段提取出来,然后转成long类型
                    @Override
                    public long extractTimestamp(String element) {
                        String[] time = element.split(",");
                        return Long.parseLong(time[0]);
                    }
                    
                });;
        SingleOutputStreamOperator<Tuple2<String, Integer>> word = lines.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            private static final long serialVersionUID = 7179469626039725354L;

            @Override
            public void flatMap(String value,
                    Collector<Tuple2<String, Integer>> out) throws Exception {
                String[] splitStr = value.split(",");
                out.collect(Tuple2.of(splitStr[1], Integer.parseInt(splitStr[2])));
            }
        });
        //先分组,再划分窗口
        KeyedStream<Tuple2<String, Integer>, Tuple> keyed = word.keyBy(0);
        WindowedStream<Tuple2<String, Integer>, Tuple, TimeWindow>  windowStream = keyed.window(TumblingEventTimeWindows.of(Time.seconds(5)));
        SingleOutputStreamOperator<Tuple2<String,Integer>> summed = windowStream.sum(1);
        summed.print();

输入:

1000,a,1
2000,a,1
4999,a,1
6666,a,1
7777,a,1
9998,a,1
10001,a,1

输出:

3> (a,3)
3> (a,3)

 

如果source并行度大于1,需要所有并行度输入滚动时间都满足大于5s才会执行

(如果使用的是并行Source),例如:KafkaSource,创建Kafka的Topic时有多个分区,每一个Source的分区都要满足触发的条件,整个窗口才会被触发

 二、WaterMark延迟触发任务机制

一中相同代码,输入:

1000,1,1
1001,2,1
1005,1,1
4999,1,1
6666,1,1
10005,1,1
7777,1,1

输出:

2> (1,3)
1> (2,1)
2> (1,1)

最后输入的7777,1,1数据就被丢弃了

可通过设置延迟时间解决这一问题:

        DataStream<String> lines = env.socketTextStream("192.168.87.130", 8888)//仅仅提取时间字段,不会改变数据的样式
                .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<String>(Time.seconds(2)) {
                    private static final long serialVersionUID = -4441231666252017557L;
                    //将数据中的时间字段提取出来,然后转成long类型
                    @Override
                    public long extractTimestamp(String element) {
                        String[] time = element.split(",");
                        return Long.parseLong(time[0]);
                    }
                });

输入:

1000,1,a
1000,a,1
2000,a,1
5000,a,1
6999,a,1

输出:

3> (a,2)

 三、EventTime和SlidingWindow

代码同上,仅需修改

WindowedStream<Tuple2<String, Integer>, Tuple, TimeWindow> windowStream = keyed.window(SlidingEventTimeWindows.of(Time.seconds(6), Time.seconds(2)));

不设置延迟时间,输入:

1000,a,1
1999,a,1
1000,a,1
1999,a,1
2222,b,1
2999,a,1
4000,a,1

输出:

3> (a,2)
3> (a,3)
1> (b,1)