Dive deep on Netflix‘s recommender system(Netflix推荐系统是如何实现的?)
Reference: https://towardsdatascience.com/deep-dive-into-netflixs-recommender-system-341806ae3b48
Netflix has a subscription-based model. Simply put, the more members (the term used by Netflix, synonymous to users/subscribers) Netflix has, the higher its revenue. Revenue can be seen as a function of three things:
- 高新用户获取率
- 低用户流失
- 用户重新加入率
Netflix的推荐系统非常重要,这在于其80%的观看都是通过推荐系统(就是说很少有人会直接搜索电影名字然后观看)。
那么Netflix推荐系统究竟是怎么样的呢?
每一行是一个种类,第一行是用户最喜欢的种类,往下逐渐递减。然后每一行内部,从左到右,都是推荐度顺序递减。
为什么用“行”呢?
可以从两个方面来看:作为用户,每一行内部的连贯与相似使得他更好的决定是不是要看一下这个种类。作为公司,这种方式也简洁明了(方面公司了解用户是否感兴趣和感兴趣的程度)
用的是什么算法呢?
Netflix自然是不会让让公司算法公开的,所以有以下几个假设
PVR: personalised video ranking: this filters down the catalog by a certain crieria(comedy, drama…), combined with side features including user features and popularity.
Top N video ranker: 和PVR相似但是包含所有总类(就是那个top picks for you行)
Trending now ranker: this algorithm captures temporal trends which Netflix deduces to be strong predictors. 这是short term trends, can range from few minutes to few days.(比如说节日的trending, 或者什么突发事件的trending)
Continue watching ranker: 听名字就知道是啥意思了。
还有一种算法 是基于本用户时间线来的,这种数据叫做Contextual sequence data.用于Video-video similarity ranker.他会用这种数据来进行用户的推荐。
那推荐行是怎么生成的呢?
Each of the above algorithms go through the row generation process seen in the image below. For example, if PVR is looking at Romance titles, it will find candidates that fit this genre, and at the same time come up with evidence to support the presentation of a row (e.g. previously watched Romance movies that the member has watched). From my understanding, this evidence selection algorithm is incorporated (or used together) in every other ranking algorithm listed above to create a more curated list ranking of items (see below Netflix’s model workflow image).
Each of the five algorithms go through the same row generation process as seen in the image below.

那么页是怎么生成的呢?
在算法生成每一行之后,(可能共有10000行,每一行都代表一个可能的种类或者其他推荐)
见下图,红色虚线框就是我们要进行处理的不分。

一般来说,Netflix用template based approach去解决这个问题。这个问题要注意很多地方,比如说accuarcy, diversity, accessililty and stablisity.其他要考虑的还包括硬件能力。
所以,template based approach最开始能够较好的完成任务,但是 netfilx 后来做了一些本地的优化来实现更好的用户体验。
我们如何解决这个row ranking的问题,有哪些方法?
row based approach
这种方法用一种给row打分的方法来进行行之间的排序。但是这种方法有时候会缺少多样性。
也就是说 用户可能会看到类似的行推荐
stage wise approach
这个方法是上一个的改进版,改进之处在于每一行并不是被独立同步生成的,而是:后面一行的生成会考虑前面的行是不是有类似的。
machine learning approach:
they aim to create a scoring function by training a model using historical information of which homepages they have created for their members — including what they actually see, how they interacted with and what they played.
Cold start problem:
用户的初始化设置的问题:即刚开始啥数据都没有时候,如何给用户推荐?
Netflix采用了大多数推荐网站的方法:问几个问题。然后进行推荐。
Netflix Watching Party的意义是什么?
之前,netflix只能根据这个用户的行为习惯和大数据下分析的总体行为习惯进行推荐,但是现在有了NP,netflix就能知道当前用户在和哪些用户进行交互。
A/B testing:
线上和线下测试实际存在。线下测试会评估我们的模型水准。但是这些评估无法保证这些结果会真的提升用户体验。所以开发团队用A/B测试来测试他们的算法。
Do bear in mind that A/B testing itself is an art, as there are many variables to consider including how to select the control and test group, how to determine if an A/B test is statistically significant (i.e. improve the overall user experience as a whole), choosing a control/test group size, what metrics to use in A/B testing, and many more.

总结:
Below shows a detailed architecture diagram of Netflix.


浙公网安备 33010602011771号