用Pivot, OData和Windows Azure技术实现Visual Netflix Browsing

作者：Steve Marx 翻译：郑子颖

原文地址：http://blog.smarx.com/posts/pivot-odata-and-windows-azure-visual-netflix-browsing

如果你还没看过http://netflixpivot.cloudapp.net，那你该赶紧去试一下，它可以帮助你浏览 Netflix上可以在线观看的电影，它的界面和排序过滤等操作都颇为华丽。那是我用最新的Silverlight控件PivotViewer 做一个示例。下面我就讲解以下它是怎么实现的。

概述

在这个示例中，数据来自于Netflix的OData源（OData feed），实现时用到了以下这些Windows Azure技术：一个Web Role用来产生网页、一个Worker Role用来定期更新数据、以及Blob和Windows Azure CDN用来存放Pivot数据集（Pivot collection）以及Silverlight控件。除此之外，我并没有写很多的代码，总共也只用了500行左右。我本来还想用Pauthor类库的，那样就可以再少写一点代码。

第一步：生成Pivot数据集

Pivot数据集是在我的Worker Role里面生成的。这里我只起了一份Worker Role实例，所以每次处理一遍最新的Netflix数据需要一个多小时。只起一份实例可以让代码简单很多，所有的操作都是本地的，用的磁盘也是本地的。尽管多起几份实例可以处理得快一些，但Netflix数据并不是频繁更新的，所以我不想花太多时间来写支持多份实例的代码（需要的话可以在我以前的示例里面找到）。

我做的第一件事情是在Visual Studio里面右击我的Worker Role项目的Reference，把http://odata.netflix.com/Catalog添加为service reference。然后在NetflixPivotCreator.cs里面把Netflix的OData源读进来：

var context = new NetflixCatalog(new Uri(http://odata.netflix.com/Catalog));
DataServiceQueryContinuation<Title> token = null;
var response = ((from title in context.Titles           
　　　　where title.Instant.Available && title.Type == "Movie"
　　　　orderby title.AverageRating descending select title) as DataServiceQuery<Title>)
               .Expand("Genres,Cast,Directors")
               .Execute() as QueryOperationResponse<Title>;
int count = 0;
var ids = new HashSet<string>();
do
{
　　if (token != null)
　　{
        response = context.Execute<Title>(token);
　　}
　　foreach (var title in response)
　　{
　　　　if (ids.Add(title.Id))
　　　　{
　　　　　　if (count < howMany)
　　　　　　{
　　　　　　　　yield return title;
　　　　　　}
　　　　　　count++;
　　　　}
    }
    token = response.GetContinuation();
}
while (token != null && count < howMany);

然后是把每部电影的封面下载下来，创建成Deep Zoom图像。这部分的代码基本上是长成这样子的：

Parallel.ForEach(GetTopInstantWatchTitles(3000),

    new ParallelOptions { MaxDegreeOfParallelism = 16 },

    (title) =>

{

    var boxArtUrl = title.BoxArt.HighDefinitionUrl ?? title.BoxArt.LargeUrl;

    var imagePath = string.Format(@"{0}\images\{1}.jpg", outputDirectory, title.Id.ToHex());

    new WebClient().DownloadFile(boxArtUrl, imagePath);

    new ImageCreator().Create(imagePath, string.Format(@"{0}\output\{1}.xml", outputDirectory, title.Id));

});

我想特地指出一下的是：这里我用了Task Parallel 类库（Parallel.ForEach）。用它编写多线程的并行程序特别方便。

有了所有电影的封面图像以后，把他们都放到一个Deep Zoom图像集合（Deep Zoom image collection，.dzc文件）里面只需要一行：

new CollectionCreator().Create(

titles.Select(t => string.Format(@"{0}\output\{1}.xml", outputDirectory, t.Id.ToHex())).ToList(),

string.Format(@"{0}\output\collection-{1}.dzc", outputDirectory, suffix));

现在数据已经齐了，可以生成Pivot数据集了——这里的Pivot数据集，其实就是一个.cxml结尾的文件，里面含有我从Netflix的OData源里拿到的所有电影的所有详细信息。我就不详细解释如何创建这个.cxml文件了，因为无非就是普通的XML文件操作，如果用了Pauthor library就更简单了。有兴趣的可以看源代码里的CreateCxml函数。

第二步：把Pivot数据集存到Blob里面

Cxml文件创建好以后，Worker Role就需要把它上传到Blob里面。这部分代码没什么可以多说的，写的时候只有几件事情需要注意以下：一，用并行上传来加快速度；二，要在Blob里面设定正确的content type；三，用CDN的时候要设一下cache control header。另外，要先上传其他文件（比如封面图片），最后再上传.cxml文件。如果先上传.cxml，用户在浏览器里会看到一些图片无法显示。

private void UploadDirectoryRecursive(string path, CloudBlobContainer container)

{

    string cxmlPath = null;

    // 用16个线程上传

    Parallel.ForEach(EnumerateDirectoryRecursive(path),

        new ParallelOptions { MaxDegreeOfParallelism = 16 },

        (file) =>

    {

        // save collection-#####.cxml for last

        if (Path.GetFileName(file).StartsWith("collection-") && Path.GetExtension(file) == ".cxml")

        {

            cxmlPath = file;

        }

        else

        {

            // upload each file, using the relative path as a blob name

            UploadFile(file, container.GetBlobReference(Path.GetFullPath(file).Substring(path.Length)));

        }

    });

    // 完成cxml文件本身的上传

    if (cxmlPath != null)

    {

        UploadFile(cxmlPath, container.GetBlobReference(Path.GetFullPath(cxmlPath).Substring(path.Length)));

    }

}

private IEnumerable<string> EnumerateDirectoryRecursive(string root)

{

    foreach (var file in Directory.GetFiles(root))

        yield return file;

    foreach (var subdir in Directory.GetDirectories(root))

        foreach (var file in EnumerateDirectoryRecursive(subdir))

            yield return file;

}

private void UploadFile(string filename, CloudBlob blob)

{

    var extension = Path.GetExtension(filename).ToLower();

    if (extension == ".cxml")

    {

        // 把CXML的客户端缓存时间设为30分钟

        blob.Properties.CacheControl = "max-age=1800";

    }

    else

    {

        // 其他文件（如图片）的客户端缓存时间设为2小时

        blob.Properties.CacheControl = "max-age=7200";

    }

    switch (extension)

        {

            case ".xml":

            case ".cxml":

            case ".dzc":

                blob.Properties.ContentType = "application/xml";

                break;

            case ".jpg":

                blob.Properties.ContentType = "image/jpeg";

                break;

        }

    blob.UploadFile(filename);

}

第三步：在浏览器里面显示出来

Pivot数据集上传好以后大部分的工作就完成了。最后一步就是写个Web Role，用PivotViewer把数据显示在浏览器里。这里我没有直接用PivotViewer，而是创建了一个PivotViewer的子类，增加了一些我自己的代码，当用户在浏览器里双击电影封面或者点"View on Netflix"的时候，可以把用户直接带到这部电影的Netflix的页面。

public class NetflixPivotControl : PivotViewer

{

    public NetflixPivotControl()

    {

        ItemActionExecuted += new EventHandler<ItemActionEventArgs>(NetflixPivotViewer_ItemActionExecuted);

        ItemDoubleClicked += new EventHandler<ItemEventArgs>(NetflixPivotViewer_ItemDoubleClicked);

    }

    private void BrowseTo(string itemId)

    {

        HtmlPage.Window.Navigate(new Uri(GetItem(itemId).Href));

    }

    private void NetflixPivotViewer_ItemDoubleClicked(object sender, ItemEventArgs e)

    {

        BrowseTo(e.ItemId);

    }

    private void NetflixPivotViewer_ItemActionExecuted(object sender, ItemActionEventArgs e)

    {

        BrowseTo(e.ItemId);

    }

    protected override List<CustomAction> GetCustomActionsForItem(string itemId)

    {

        var list = new List<CustomAction>();

        list.Add(new CustomAction("View on Netflix", null, "View this movie at Netflix", "view"));

        return list;

    }

}

我的Web Role是用ASP.NET MVC写的，里面就是一个页面，上面放了一个Silverlight控件（实际的Silverlight程序也是存在Blob里面的，就是一个.xap文件）。

每个.cxml在Blob里都有一个时间戳。为了避免从CDN的缓存里拿到老的.cxml，Web Role里面的代码会根据时间戳拿最新的一个.cxml：

private Uri GetBlobOrCdnUri(CloudBlob blob, string cdnHost)

{

    // 用HTTP可以避免Silverlight的跨协议问题（cross-protocol issues）

    var ub = new UriBuilder(blob.Uri)

    {

        Scheme = "http",

        Port = 80

    };

    if (!string.IsNullOrEmpty(cdnHost))

    {

        ub.Host = cdnHost;

    }

    return ub.Uri;

}

public ActionResult Index()

{

    var blobs = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("DataConnectionString"))

        .CreateCloudBlobClient();

    var cdnHost = RoleEnvironment.GetConfigurationSettingValue("CdnHost");

   var controlBlob = blobs.GetBlobReference("control/NetflixPivotViewer.xap");

    var collectionBlob = blobs.ListBlobsWithPrefix("collection/collection-").OfType<CloudBlob>()

        .Where(b => b.Uri.AbsolutePath.EndsWith(".cxml")).First();

    ViewData["xapUrl"] = GetBlobOrCdnUri(controlBlob, cdnHost).AbsoluteUri;

    ViewData["collectionUrl"] = GetBlobOrCdnUri(collectionBlob, cdnHost).AbsoluteUri;

    return View();

}

下载本示例的源代码

其实主要的代码都已经贴在上面了。如果你需要完整的源代码（可用Visual Studio 2010打开），可以到http://cdn.blog.smarx.com/files/NetflixPivot_source_updated3.zip下载.

如果你需要运行本示例，你还需要：

运行起来以后如果看不到东西的话不要着急，生成Pivot数据集挺花时间的，等一个小时是起码的。

posted on 2011-08-22 08:14 拨云见日阅读(742) 评论(1) 编辑收藏举报