【原创】StreamInsight查询系列(十六)——查询模式之左外联接

上篇文章介绍了查询模式中窗口比率部分,这篇博文将介绍StreamInsight中的左外联接(Left Outer Join)。

测试数据准备

为了方便测试查询,我们首先准备一个静态的测试数据源:

var time = DateTime.Parse("10/23/2011 09:37:00 PM");
var inputEvents = new[]
{  
    new { Timestamp = time+TimeSpan.FromMinutes(1), accountNo = "800001", branchName="43rd Street", customerName="John", amountWithdrawn = 100.0 },
    new { Timestamp = time+TimeSpan.FromMinutes(2), accountNo = "800002", branchName="23rd Street", customerName="Mark", amountWithdrawn = 250.0 },
    new { Timestamp = time+TimeSpan.FromMinutes(3), accountNo = "800003", branchName="43rd Street", customerName="Chandler", amountWithdrawn = 500.0 },
    new { Timestamp = time+TimeSpan.FromMinutes(4), accountNo = "800004", branchName="43rd Street", customerName="Dave", amountWithdrawn = 380.0 },
    new { Timestamp = time+TimeSpan.FromMinutes(4), accountNo = "800005", branchName="43rd Street", customerName="Mike", amountWithdrawn = 5000.0 },
    new { Timestamp = time+TimeSpan.FromMinutes(5), accountNo = "800006", branchName="23rd Street", customerName="Sarah", amountWithdrawn = 80.0 },
    new { Timestamp = time+TimeSpan.FromMinutes(6), accountNo = "800007", branchName="43rd Street", customerName="Ross", amountWithdrawn = 500.0 },
    new { Timestamp = time+TimeSpan.FromMinutes(8), accountNo = "800008", branchName="43rd Street", customerName="Joey", amountWithdrawn = 1500.0 }
};

var branchInfo = new[]
{
    new { branchName = "23rd Street", city = "Some Town", zip = "11111" }
};

接下去将上述数据源转变为点类型复杂事件流(其中branchStream将作为引用事件流):

var branchStream = branchInfo.ToPointStream(Application, t =>
    PointEvent.CreateInsert(time, t),
    AdvanceTimeSettings.StrictlyIncreasingStartTime);
// 将branchStream事件流中的所有时间持续时间延伸至无穷大值,方便后续作为引用流
var refStream = branchStream.AlterEventDuration(e => TimeSpan.MaxValue);

var inputStream = inputEvents.ToPointStream(Application, t =>
    PointEvent.CreateInsert(t.Timestamp, t),
    AdvanceTimeSettings.StrictlyIncreasingStartTime);

左外联接

熟悉SQL的读者一定很熟悉左外联接(Left Outer Join),这里再举一个左外联接的简单例子:

假设存在两张表A和B,表A记录了人员编号id及人员姓名Name,如下:

id

name

1 张三
2 李四
3 王五

表B记录了人员编号id及对应职业job,如下:

id

job

1 学生
2 教师
4 程序员

将表A与表B进行左外联接

SELECT A.*, B.* FROM A

LEFT JOIN B ON A.id = B.id

结果如下:

id name id job
1 张三 1 学生
2 李四 2 教师
3 王五 NULL NULL

好了复习完了左外联接的基础知识,让我们再回顾一下之前学习过的StreamInsight内部联接:

var innerJoinStream = from left in inputStream
                      join right in refStream on left.branchName equals right.branchName
                      select new
                      {
                          accountNo = left.accountNo,
                          amountWithdrawn = left.amountWithdrawn,
                          customerName = left.customerName,

                          branchName = right.branchName,
                          branchCity = right.city,
                          branchZip = right.zip
                      };

结果显示了inputStream和refStream中branchName相同且事件重叠的两个事件:

之所以介绍内部联接,是因为实现StreamInsight中的左外联接要用到内部联接。我们可以把左外联接看做是内部联接和左反半部联接的并集。

下面进行左外半部联接查询找出inputStream存在但是refStream没有的事件:

var lasjStream = from left in inputStream
                 where (from right in refStream
                        where left.branchName == right.branchName
                        select right).IsEmpty()
                 select new
                 {
                     accountNo = left.accountNo,
                     amountWithdrawn = left.amountWithdrawn,
                     customerName = left.customerName,

                     branchName = left.branchName,
                     branchCity = "MISSING",
                     branchZip = "MISSING"
                 };    

结果如下:

最后左外联接的结果就是内部链接和左反半部联接的结果并集:

var leftOuterJoinStream = innerJoinStream.Union(lasjStream);    

左外联接的最终结果如下:

下一篇将介绍StreamInsight查询模式中如何应对瞬变及报警泛滥。

posted @ 2011-09-06 00:26  StreamInsight  阅读(1035)  评论(0编辑  收藏  举报