# CTC（Connectionist Temporal Classification）介绍

### CTC解决什么问题

CTC，Connectionist Temporal Classification，用来解决输入序列和输出序列难以一一对应的问题。

### CTC基本概述

$x = (x^1, x^2, ..., x^T) \\x^t = (x_1^t, x_2^t, ..., x_m^t)$

$y = (y^1, y^2, ..., y^T) \\y^t = (y_1^t, y_2^t, ..., y_n^t)$

$\begin{split} B(\pi^1) &= B(--stta-t---e) = state \\ B(\pi^2) &= B(sst-aaa-tee-) = state \\ B(\pi^3) &= B(--sttaa-tee-) = state \\ B(\pi^4) &= B(sst-aa-t---e) = state \end{split}$

$p(l \ | \ x) = \sum_{B(\pi)=l} p(\pi | x)$

$p(\pi | x) = \prod_{t=1}^{T} y_{\pi_t}^t$

$\pi = (--stta-t---e) \\ p(\pi | x) = {y^1}_{-} \cdot {y^2}_{-} \cdot y^3_{s} \cdot y^4_{t} \cdot y^5_{t} \cdot y^6_{a} \cdot {y^7}_{-} \cdot {y^8}_{t} \cdot {y^9}_{-} \cdot {y^{10}}_{-} \cdot {y^{11}}_{-} \cdot y^{12}_{e}$

### CTC中的前向后向算法

$l = state \\ {l}' = -s-t-a-t-e-$

$\frac{\partial \, p(l \ |x)}{\partial \, y_k^t} = \frac{\partial \, \sum_{B(\pi)=l, \pi_t=k} p(\pi | x)}{\partial \, y_k^t}$

4条路径都在t=6时经过了字符a，观察4条路径，可以得到如下式子。

$\begin{split} \pi^1 &= b = b_{1:5} + a_6 + b_{7:12} \\ \pi^2 &= r = r_{1:5} + a_6 + r_{7:12} \\ \pi^3 &= b_{1:5} + a_6 + r_{7:12} \\ \pi^4 &= r_{1:5} + a_6 + b_{7:12} \end{split}$

$\begin{split} p( \pi^1, \pi^2, \pi^3, \pi^4 | x) &= {y^1}_{-} \cdot {y^2}_{-} \cdot {y^3}_{s} \cdot {y^4}_{t} \cdot {y^5}_{t} \cdot {y^6}_{a} \cdot {y^7}_{-} \cdot {y^8}_{t} \cdot {y^9}_{-} \cdot {y^{10}}_{-} \cdot {y^{11}}_{-} \cdot {y^{12}}_{e} \\ &+ {y^1}_{s} \cdot {y^2}_{s} \cdot {y^3}_{t} \cdot {y^4}_{-} \cdot {y^5}_{a} \cdot {y^6}_{a} \cdot {y^7}_{a} \cdot {y^8}_{-} \cdot {y^9}_{t} \cdot {y^{10}}_{e} \cdot {y^{11}}_{e} \cdot {y^{12}}_{-} \\ &+ {y^1}_{-} \cdot {y^2}_{-} \cdot {y^3}_{s} \cdot {y^4}_{t} \cdot {y^5}_{t} \cdot {y^6}_{a} \cdot {y^7}_{a} \cdot {y^8}_{-} \cdot {y^9}_{t} \cdot {y^{10}}_{e} \cdot {y^{11}}_{e} \cdot {y^{12}}_{-} \\ &+ {y^1}_{s} \cdot {y^2}_{s} \cdot {y^3}_{t} \cdot {y^4}_{-} \cdot {y^5}_{a} \cdot {y^6}_{a} \cdot {y^7}_{-} \cdot {y^8}_{t} \cdot {y^9}_{-} \cdot {y^{10}}_{-} \cdot {y^{11}}_{-} \cdot {y^{12}}_{e}\end{split}$

$\begin{split} forward &= p(b_{1:5} + r_{1:5} | x) = {y^1}_{-} \cdot {y^2}_{-} \cdot {y^3}_{s} \cdot {y^4}_{t} \cdot {y^5}_{t} + {y^1}_{s} \cdot {y^2}_{s} \cdot {y^3}_{t} \cdot {y^4}_{-} \cdot {y^5}_{a} \\ backward &= p(b_{7:12} + r_{7:12} | x) = {y^7}_{-} \cdot {y^8}_{t} \cdot {y^9}_{-} \cdot {y^{10}}_{-} \cdot {y^{11}}_{-} \cdot {y^{12}}_{e} + {y^7}_{a} \cdot {y^8}_{-} \cdot {y^9}_{t} \cdot {y^{10}}_{e} \cdot {y^{11}}_{e} \cdot {y^{12}}_{-} \end{split}$

$p( \pi^1, \pi^2, \pi^3, \pi^4 | x) = forward \cdot y_a^t \cdot backward$

$\sum_{B(\pi)=l, \pi_6=a} p(\pi | x) = forward \cdot y_a^t \cdot backward$

$\alpha_t({l}'_k) = \sum_{B(\pi)=l, \ \pi_t = {l}'_k} \ \prod_{{t}'=1}^{t} {y_{\pi_{{t}'}}^{{t}'}}$

t=1时，符号只能是空白符或$$l_1$$，可以得到以下初始条件：

$\alpha_1(-) = {y^1}_{-} \\ \alpha_1(l_1) = y^1_{l_1} \\ \alpha_1(l_t) = 0, \forall t > 1$

$\alpha_6(a) = (\alpha_5(a) + \alpha_5(t) + \alpha_5(-)) \cdot y_a^6$

$\alpha_t({l}'_k) = (\alpha_{t-1}({l}'_k) + \alpha_{t-1}({l}'_{k-1}) + \alpha_{t-1}(-)) \cdot y_{{l}'_k}^t$

$\beta_t({l}'_k) = \sum_{B(\pi)=l, \ \pi_t = {l}'_k} \ \prod_{{t}'=t}^{T} {y_{\pi_{{t}'}}^{{t}'}}$

t=T时，符号只能是空白符或$$l_{|{l}'|-1}$$，可以得到以下初始条件：

$\beta_T(-) = {y^T}_- \\ \beta_T({l}'_{|{l}'|-1}) = y^T_{l_{|{l}'|-1}} \\ \beta_T(l_{|{l}'|-i}) = 0, \forall i > 1$

$\beta_t({l}'_k) = (\beta_{t+1}({l}'_k) + \beta_{t+1}({l}'_{k+1}) + \beta_{t+1}(-)) \cdot y_{{l}'_k}^t$

$\alpha_t({l}'_k) \beta_t({l}'_k) = \sum_{B(\pi)=l, \ \pi_t = {l}'_k} \ y_{{l}'_k}^t \prod_{t=1}^{T} {y_{\pi_t}^t}$

$p(l \ | \ x) = \sum_{B(\pi)=l, \ \pi_t = {l}'_k} p(\pi | x) = \sum_{B(\pi)=l, \ \pi_t = {l}'_k} \ \prod_{t=1}^{T} y_{\pi_t}^t$

$p(l \ | \ x) = \sum_{B(\pi)=l, \ \pi_t = {l}'_k} \ \frac{\alpha_t({l}'_k) \beta_t({l}'_k) }{y_{{l}'_k}^t}$

$\frac{\partial \, p(l \ |x)}{\partial \, y_k^t} = \frac{\partial \ \sum_{B(\pi)=l, \ \pi_t = k} \frac{\alpha_t(k) \beta_t(k) }{y_k^t}}{\partial \, y_k^t}$

### 参考资料

[1] 知乎上的一篇文章：一文读懂CRNN+CTC文字识别

[2] Distill上一篇关于CTC的介绍（作者Hannun Awni）：Sequence Modeling With CTC

posted @ 2018-11-13 19:00  PilgrimHui  阅读(29129)  评论(2编辑  收藏  举报