Flink-窗口(处理时间,count统计窗口,session回话窗口)
* 时间窗口
* SlidingEventTimeWindows: 滑动的事件时间窗口
* SlidingProcessingTimeWindows: 滑动的处理时间窗口
* TumblingEventTimeWindows:滚动的事件时间窗口
* TumblingProcessingTimeWindows: 滚动的处理时间窗口
*
* 滑动:窗口会存在交叉部分
* 滚动:窗口美哦与交叉
*
* 事件时间:数据中自带一个时间字段, 如果要使用事件时间需要设置时间字段和水位线
1. 滚动的处理时间窗口
package com.wt.flink.window
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
object Demo1TimeWindow {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
//读取卡口过车数据
val dataDS: DataStream[String] = env.socketTextStream("master", 8888)
//整理数据取出道路编号和时间戳
val kcDS: DataStream[(String, Long)] = dataDS.map(line => {
val split: Array[String] = line.split(",")
//道路编号
val roadId: String = split(1)
//时间戳
val ts: Long = split(2).toLong
(roadId, ts)
})
val kvDS: DataStream[(String, Int)] = kcDS.map(kv => (kv._1, 1))
val keyByDS: KeyedStream[(String, Int), String] = kvDS.keyBy(_._1)
/**
* 时间窗口
* SlidingEventTimeWindows: 滑动的事件时间窗口
* SlidingProcessingTimeWindows: 滑动的处理时间窗口
* TumblingEventTimeWindows:滚动的事件时间窗口
* TumblingProcessingTimeWindows: 滚动的处理时间窗口
*
* 滑动:窗口会存在交叉部分
* 滚动:窗口美哦与交叉
*
* 事件时间:数据中自带一个时间字段, 如果要使用事件时间需要设置时间字段和水位线
* 处理时间:数据被处理的时间
*
*/
val windowDS: WindowedStream[(String, Int), String, TimeWindow] = keyByDS
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
val countDS: DataStream[(String, Int)] = windowDS.sum(1)
countDS.print()
env.execute()
}
}
2. 统计窗口(每达到一定的量就做一次统计)
package com.wt.flink.window
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.windows.GlobalWindow
object Demo2CountWindow {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val linesDS: DataStream[String] = env.socketTextStream("master", 8888)
val wordsDS: DataStream[String] = linesDS.flatMap(_.split(','))
val kvDS: DataStream[(String, Int)] = wordsDS.map((_, 1))
val keyByDS: KeyedStream[(String, Int), String] = kvDS.keyBy(_._1)
/**
* 统计窗口
* countWindow(10): 滚动的统计窗口
* countWindow(10,5): 滑动的统计窗口
*
*/
val countWindowDS: WindowedStream[(String, Int), String, GlobalWindow] = keyByDS
.countWindow(10, 5)
val countDS: DataStream[(String, Int)] = countWindowDS.sum(1)
countDS.print()
env.execute()
}
}
3. Session窗口(停止接收数据多长时间后,开始统计)
package com.wt.flink.window
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.{EventTimeSessionWindows, ProcessingTimeSessionWindows}
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
object Demo3SessionWindow {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
//读取卡口过车数据
val dataDS: DataStream[String] = env.socketTextStream("master", 8888)
//整理数据取出道路编号和时间戳
val kcDS: DataStream[(String, Long)] = dataDS.map(line => {
val split: Array[String] = line.split(",")
//道路编号
val roadId: String = split(1)
//时间戳
val ts: Long = split(2).toLong
(roadId, ts)
})
//设置时间字段
val assDS: DataStream[(String, Long)] = kcDS.assignAscendingTimestamps(_._2)
val kvDS: DataStream[(String, Int)] = assDS.map(kv => (kv._1, 1))
val keyByDS: KeyedStream[(String, Int), String] = kvDS.keyBy(_._1)
/**
* 会话窗口 -- 同一个key一段时间没有数据开始计算
* ProcessingTimeSessionWindows: 处理时间的会话窗口
* EventTimeSessionWindows: 事件时间的会话窗口,需要指定时间字段
*
*
*/
val windowDS: WindowedStream[(String, Int), String, TimeWindow] = keyByDS
.window(EventTimeSessionWindows.withGap(Time.seconds(5)))
val countDS: DataStream[(String, Int)] = windowDS.sum(1)
countDS.print()
env.execute()
}
}