小记-------- Flink的waterMark的起始位置如何计算(源码分析)
源码位置:
.timeWindow(Time.milliseconds(1000L))
timeWindow()
def timeWindow(size: Time): WindowedStream[T, K, TimeWindow] = {
new WindowedStream(javaStream.timeWindow(size))
}
javaStream.timeWindow(size)
public WindowedStream<T, KEY, TimeWindow> timeWindow(Time size) {
if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) {
return window(TumblingProcessingTimeWindows.of(size));
} else {
return window(TumblingEventTimeWindows.of(size));
}
}
window(TumblingEventTimeWindows.of(size))
public Collection<TimeWindow> assignWindows(Object element, long timestamp, WindowAssignerContext context) {
if (timestamp > Long.MIN_VALUE) {
if (staggerOffset == null) {
staggerOffset = windowStagger.getStaggerOffset(context.getCurrentProcessingTime(), size);
}
// Long.MIN_VALUE is currently assigned when no timestamp is present
long start = TimeWindow.getWindowStartWithOffset(timestamp, (globalOffset + staggerOffset) % size, size);
return Collections.singletonList(new TimeWindow(start, start + size));
} else {
throw new RuntimeException("Record has Long.MIN_VALUE timestamp (= no timestamp marker). " +
"Is the time characteristic set to 'ProcessingTime', or did you forget to call " +
"'DataStream.assignTimestampsAndWatermarks(...)'?");
}
}
TimeWindow.getWindowsStartWithOffset(timestamp,(globalOffset + staggerOffset) % size, size)
public static long getWindowStartWithOffset(long timestamp, long offset, long windowSize) {
return timestamp - (timestamp - offset + windowSize) % windowSize;
}
一直追到这个位置也就是WaterMark的计算公式
timestamp - (timestamp - offset +windowSize)% windowSize;
其中timestamp是我们每条数据元素本身自带的eventtime时间戳 windowSize是窗口时间也就是我们设置的。offset默认是0,主要是修改时区的,本次分析默认为0
因此公式可以简化为:timestamp -(timestamp + windowSize) % windowSize
一个数对自己取余数结果恒等于0 ,故再次简化为: timestamp - (timestamp % windowSize)
也就是时间戳减去时间戳对窗口时间的余数 => 也就是timestamp对windowSize的整数倍。
举个栗子: 假设元素时间戳为1547718199000 窗口大小为15000 单位均为毫秒
起始位置= 1547718199000 - (1547718199000 - 0 + 15000) % 15000
= 154771899000 - 4000
= 154771895000
所以第一个时间窗口为:[1547718195000 - 1547718210000) 前闭后开 , 后面的窗口以此类推
作者:于二黑
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

浙公网安备 33010602011771号