<解析>speaker verification模型中的GE2E损失函数

GE2E loss 是什么

GE2E loss 全称为Generalized end to end loss function。它聚焦于embedding的差异性，比TE2E(tuple-based endto-end loss function)损失函数更有效。

batch的形式 每个batch由NxM个embedding组成，形状为(N,M,e) ：N个speaker,每个speaker有M个embedding，每个embedding的长度为e。
\(e_{j,i}\) 第j个speaker的第i个embedding
\(c_j\) 第j个speaker的centroid（我把他翻译为中心向量），\(c_{j}\) = \(\frac{1}{M}\)\(\sum^{M}_{m=1}\)\(e_{jm}\)
\(S_{ji,k}\) eji和ck的相似度。我们定义S为相似矩阵。\(S_{ji,k}\) = w · cos(\(e_{j,i}\), \(c_k\)) + b

在计算正相关对儿的相似度的时候，即计算\(S_{jk,j}\)的时候将\(e_{ji}从\)\(c_j\)的计算公式中移除，会取得更佳的效果。
TD-SV & TI-SV TD-SV即text-dependent speaker verification，TI-SV即text-independent speaker verification. In TD-SV, the transcript of both enrollment and verification utterances is phonetially constrained, while in TI-SV, there are no lexicon constraints on the transcript of the enrollment or verification utterances, exposing a larger variability of phonemes and utterance durations.

GENERALIZED END-TO-END LOSS FOR SPEAKER VERIFICATION https://arxiv.org/pdf/1710.10467.pdf

posted @ 2020-07-20 09:49 dynmi 阅读(997) 评论(0) 收藏举报

刷新页面返回顶部