Sampling-bias-corrected neural modeling for large corpus item recommendations

概
符号说明
问题
解决方法

Yi Y., Yang J., Hong L., Cheng D. Z., Heldt L., Kumthekar A., Zhao Z., Wei L. and Chi E. Sampling-bias-corrected neural modeling for large corpus item recommendations. In ACM Conference on Recommender Systems (RecSys), 2019.

概

在推荐系统中, 我们常常需要采样负样本, 一种比较普遍的方式是采用 in-batch 的方式来进行, 但是这种方法存在偏置, 作者将 here 的方法进行一个拓展.

符号说明

\(\{x_i\}_{i=1}^N\), query 属性;
\(\{y_j\}_{j=1}^M\), item 属性;
通过 \(u(\cdot), v(\cdot)\) 将二者映射到一个 \(k\) 维的空间上:
\[u(x), v(x) \in \mathbb{R}^k, \]
如下图所示:

两者间的分数由

\[s(x, y) = \langle u(x), v(x) \rangle \]
表示;
\(\mathcal{T}:= \{(x_i, y_i, r_i)\}_{i=1}^T\), 训练数据集;

问题

我们通常会建模将 item \(y\) 从一堆 items \(\{y_j\}_{j=1}^M\) 择出来的概率:

\[\mathcal{P}(y|x;\theta) = \frac{e^{(s(x, y))}}{\sum_{j\in [M]} e^{s(x, y_j)}}, \]
然后通过如下的损失函数进行训练:

\[L_T(\theta) := -\frac{1}{T} \sum_{i \in [T]} r_i \cdot \log (\mathcal{P}(y_i|x_i;\theta)), \]
当 \(M\) 很大的时候, 其实计算是很复杂的, 因为我们需要计算每一个 \(\mathcal{P}(y_j|x_i; \theta)\);
所以现在的一个普遍的策略就是直接拿当前的 batch 的数据作为负样本:

\[\mathcal{P}_B(y|x;\theta) = \frac{e^{(s(x, y))}}{\sum_{j\in [B]} e^{s(x, y_j)}}, \]
但这会导致流行的 items 会被经常性地惩罚, 这里有一个 bias.

解决方法

我们需要对 score 进行矫正, 利用 here 中的方法定义:

\[s^c(x_i, y_j) = s(x_i, y_j) - \log (p_j), \]
这里用 \(p_j\) 来表示在一个 random batch 中采样到 \(j\) 的概率;
这时我们用如下损失进行训练:

\[L_B(\theta) := -\frac{1}{B} \sum_{i \in [B]} r_i \cdot \log (\mathcal{P}_B^c(y_i|x_i;\theta)), \]
为了估计 \(p_j\), 作者维护两个列表: \(A, B\), 其中 \(A[j]\) 记录了上一次采样到 \(j\) 的时刻, 而 \(B[j]\) 则表示 \(j\) 被采样的频率, 假设在下一个 step \(t'\) 中采样到了 \(j\), 则更新:

\[B[j] \leftarrow (1 - \alpha) \cdot B[j] + \alpha \cdot (t' - A[j]) \\ A[j] \leftarrow t', \]
并用

\[\hat{p}_j = 1 / B[j] \]
来表示采样概率.
可以证明, 当 \(t \rightarrow +\infty\) 的时候, \(\hat{p}_j \rightarrow p_j\).

posted @ 2022-09-19 18:41 馒头and花卷阅读(284) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

馒头and花卷

Sampling-bias-corrected neural modeling for large corpus item recommendations

概

符号说明

问题

解决方法

公告