【原创】StreamInsight查询系列(十三)——查询模式之基本模式

上篇文章介绍了查询模式中事件对齐部分,这篇博文将介绍基本模式。

基本模式

问题1:怎样检查事件B是否位于事件A发生后的90秒内?

让我们用一个例子来回答这个问题,先准备一些测试数据:

var sourceDataAB = new[]
{
    new { SourceId = "A", Value = 22, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:12:00 PM") },
    new { SourceId = "A", Value = 24, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:13:00 PM") },
    new { SourceId = "A", Value = 31, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:14:00 PM") },
    new { SourceId = "A", Value = 67, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:15:00 PM") },
    new { SourceId = "A", Value = 54, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:16:00 PM") },
    new { SourceId = "A", Value = 50, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:30:00 PM") },
    new { SourceId = "A", Value = 87, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:35:00 PM") },
};

var sourceAB = sourceDataAB.ToPointStream(Application, ev =>
    PointEvent.CreateInsert(ev.TimeStamp.ToLocalTime(), ev),
    AdvanceTimeSettings.StrictlyIncreasingStartTime);

下面我们试图在sourceAB中找出一个事件A发生后90s内的另一个事件B,且事件B的Value值比事件A的Value值大上30:

var resultAB = from first in sourceAB.AlterEventDuration(e => TimeSpan.FromSeconds(90))
               join second in sourceAB on first.SourceId equals second.SourceId
               where second.Value > first.Value + 30
               select new
               {
                   second.SourceId,
                   second.Value,
                   delta = second.Value - first.Value
               };

上述代码首先延伸了原有事件流中的所有点类型事件的持续时间到90秒,而后与原有流进行联接操作(请回忆一下StreamInsight一下联接的两个要素),找出Value值相差30的事件B以及差值,最终得到一个输出事件如下:

问题2:怎样判断事件A之后的5分钟内是否出现事件B?

同样,我们用一个例子来做介绍。首先准备数据源:

var sourceDataNoAB = new []
{
    new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:00:00 PM") },
    new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:10:00 PM") },
    new { SourceId = "A", Value = 0, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:12:00 PM") },
    new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:13:00 PM") },
    new { SourceId = "A", Value = 0, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:14:00 PM") },
    new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:15:00 PM") },
    new { SourceId = "A", Value = 0, Status = 1, TimeStamp = DateTime.Parse("10/23/2009 4:20:00 PM") },
    new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:30:00 PM") },
    new { SourceId = "B", Value = 0, Status = 0, TimeStamp = DateTime.Parse("10/23/2009 4:35:00 PM") },
};

var sourceNoAB = sourceDataNoAB.ToPointStream(Application, ev => 
    PointEvent.CreateInsert(ev.TimeStamp.ToLocalTime(), ev),
    AdvanceTimeSettings.StrictlyIncreasingStartTime);

在上面的静态数据中,我们使用了SourceId来标识事件类型。为了能够达到检测事件B是否出现在事件A后的5分钟内,这里介绍一个可以想到的方法:将所有事件A的起始时间往后挪动5分钟,并将事件B事件持续时间延长5分钟,最后进行左反半部联接。关于左反半部联接,后面会有单独的一篇用来介绍,这里大家可以将其理解为集合中的A-B操作。实现代码如下:

将所有的事件A整体向后移动5分钟:

var forwardA = (from e in sourceNoAB
               where e.SourceId == "A"
               select e).ShiftEventTime(e => e.StartTime + TimeSpan.FromMinutes(5));

将所有的事件B持续时间延伸为5分钟:

var stretchB = (from e in sourceNoAB
                where e.SourceId == "B"
                select e).AlterEventDuration(e => TimeSpan.FromMinutes(5));

最后进行左反半部联接:

var resultAB = from e in forwardA
               where (from p in stretchB
                      select p).IsEmpty()
               select e;

结果如下,仅有发生在"10/23/2009 4:20:00 PM"的事件A之后5分钟内有事件B发生:

注:另外一种方法可以直接使用AlterEventLifeTime将事件B开始时间移动到5分钟前,并将持续时间设为5分钟,这么做的好处在于不用修改事件a的生命周期,问题4中会显示该怎样操作。

问题3:怎样检测A、B、C三类事件是否发生在各自5分钟内?

让我们再次以一个简单的例子来介绍如何解决上述问题。

首先创建一个基本的数据流。在这个例子中,我们希望找出发生在5分钟内的(2)、(3)和(4)号事件。

int[] data = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var inputStream = data.ToPointStream(Application, payload =>
    PointEvent.CreateInsert(DateTime.Now + TimeSpan.FromMinutes(payload), new { payload }),
    AdvanceTimeSettings.IncreasingStartTime);

将原有事件流持续时间拉伸至5分钟,并与自身进行联接已找到某个事件点,使得2号事件和3号事件都在那段时间同时出现(在5分钟的窗口内):

var selfJoin = from e1 in inputStream.AlterEventDuration(e => TimeSpan.FromMinutes(5))
               from e2 in inputStream.AlterEventDuration(e => TimeSpan.FromMinutes(5))
               where e1.payload == 2 && e2.payload == 3
               select new
               {
                   a = e1.payload,
                   b = e2.payload
               };

将结果事件流selfJoin再次与自身进行联接,以使得4号事件也在时间窗口内:

var selfJoin2 = from e3 in inputStream.AlterEventDuration(e => TimeSpan.FromMinutes(5))
                from e1 in selfJoin
                where e1.a == 2 && e1.b == 3 && e3.payload == 4
                select new
                {
                    a = e1.a,
                    b = e1.b,
                    c = e3.payload
                };

输出结果如下:

问题4:怎样找出5分钟内A、B、C三个事件不都发生的事件(以事件A为基准)?

解决问题4的一个比较好的方法是采用“排除法”:即先计算出所有5分钟内A、B、C三个事件都发生的事件,然后和原始事件流做一次左反半部联接得到结果。

首先准备数据源:

var sourceData = new[]
{
    new { StartTime = new DateTime(2009, 6, 25, 0, 00, 00), ID = "B"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 00, 01), ID = "C"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 00, 02), ID = "A"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 05, 00), ID = "A"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 11, 00), ID = "A"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 11, 00), ID = "B"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 11, 00), ID = "C"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 15, 59), ID = "A"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 15, 59), ID = "A"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 16, 00), ID = "A"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 16, 00), ID = "A"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 18, 00), ID = "B"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 20, 59), ID = "C"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 25, 59), ID = "A"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 26, 01), ID = "B"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 29, 59), ID = "A"},
    new { StartTime = new DateTime(2009, 6, 25, 0, 30, 59), ID = "C"},
};

var source = sourceData.ToPointStream(Application, ev =>
    PointEvent.CreateInsert(ev.StartTime.ToLocalTime(), ev),
    AdvanceTimeSettings.IncreasingStartTime);

为了表述方便,这里还定义了几个常量时间:

// 设置窗口大小和一个较小偏移时间
var windowSize = TimeSpan.FromMinutes(5);
var oneTick = new TimeSpan(1);

使用过滤操作分别得到仅包含事件A、事件B和事件C的事件流:

// 根据ID过滤出相应事件流
var aStream = from e in source where e.ID == "A" select e;
var bStream = from e in source where e.ID == "B" select e;
var cStream = from e in source where e.ID == "C" select e;

首先找出事件B发生在事件A后5分钟内的结果:

// 找出事件A发生5分钟内出现时间B的联接结果
// 完成这个操作可以通过将事件B向后移动5分钟*,延伸持续时间为5分钟并事件A进行联接得到结果
var abStream = from a in aStream
               // *注:由于事件生命期区间都是左闭右开的,因此当我们改变B的生命期时,
               // 需要加上一个刻度来确保原有的StartTime包含在改变后的生命期之内
               from b in bStream.AlterEventLifetime(e => e.StartTime + oneTick - windowSize, e => windowSize)
               where true
               select a;
接下去在abStream的基础上找出事件C发生在事件A后5分钟内的结果:

 

var abcStream = from a in abStream
                from c in cStream.AlterEventLifetime(e => e.StartTime + oneTick - windowSize, e => windowSize)
                where true
                select a;

使用左反半部联接找出事件A后5分钟内事件B、事件C不都发生的事件流:

// 找出事件A后5分钟内事件B、事件C不都发生的事件流
var result2 = from a in aStream
              where (from abc in abcStream
                     where true
                     select abc).IsEmpty()
              select a;

最终结果result2输出如下:

下一篇将介绍StreamInsight查询模式中的相异计数部分。

posted @ 2011-09-04 22:29  StreamInsight  阅读(1052)  评论(2编辑  收藏  举报