论文速读记录 | 2025.07

Wasserstein Dependency Measure for Representation Learning

Wasserstein Dependency Measure for Representation Learning

arxiv：https://arxiv.org/abs/1903.11780
pdf：https://arxiv.org/pdf/1903.11780
html：https://ar5iv.labs.arxiv.org/html/1903.11780
参考博客：https://blog.csdn.net/MoonOutCloudBack/article/details/149309330
主要内容：
- 这篇文章关注通过对比学习学 embedding。具体的，让两个相似样本（正样本）的 embedding 相互靠近，而让两个不相似样本（负样本）的 embedding 相互远离。或者，训练一个相似度函数，让正样本的相似度大，负样本的相似度小。

传统方法 CPC（Contrastive Predictive Coding），通过相似度函数 \(f(x,y)\)，使用 InfoNCE loss，直接最大化 x 和 y 的互信息：

\[\mathcal{L}_{\text{CPC}} = -\mathbb{E} \left[ \log \frac{e^{f(x, y_+)}}{e^{f(x, y_+)} + \sum_{j=1}^{K-1} e^{f(x, y_j^-)}} \right] \]

这篇文章用 W 距离代替互信息 \(I(x,y)=\text{KL}(p(xy), p(x)p(y))\) 中的 KL 散度，从而提出 Wasserstein 依赖度量 (WDM)：

\[\text{WDM}(X;Y) = W_1(p(xy), p(x)p(y)) \]

具体优化方法：

主损失（与 CPC 类似）：
\[\mathcal{L}_{\text{WPC}} = - \log \frac{e^{f(x, y_+)}}{e^{f(x, y_+)} + \sum_{j=1}^{K-1} e^{f(x, y_j^-)}} \]
Lipschitz 约束：通过梯度惩罚，实现函数 f 的 1-Lipschitz 约束。1-Lipschitz 约束指的是，\(|f(a) - f(b)| \le \|a-b\|\)，其中函数 f 直接输出标量，\(\|\cdot \|\) 是一种距离度量，比如 metra 选用了 temporal distance：
\[\mathcal{L}_{\text{GP}} = \lambda \cdot \mathbb{E}_{\hat{x}, \hat{y}} \left[ (\|\nabla f(\hat{x}, \hat{y})\|_2 - 1)^2 \right] \]
总损失：\(\mathcal{L}_{\text{Total}} = \mathcal{L}_{\text{WPC}} + \mathcal{L}_{\text{GP}}\) 。

WDM 使用对偶形式 (Dual Form) 来做，这是一个数学技巧，用来实际计算 Wasserstein 距离。它告诉我们，最大化 WDM 等价于找一个满足 1-Lipschitz 约束的 f 函数，并最大化 \(\mathbb E_{(x,y)\sim P_\text{data}} f(x,y) - \mathbb E_{(x,y)\sim P_\text{independent}} f(x,y)\)。这正是 WPC 目标函数的形式！所以 WPC 直接优化这个对偶形式的目标函数。这跟 metra 也很像。

posted @ 2025-07-04 11:01 MoonOut 阅读(97) 评论(0) 收藏举报

刷新页面返回顶部

月出兮彩云归 🌙

论文速读记录 | 2025.07

Wasserstein Dependency Measure for Representation Learning

公告