KMP
-
能够在线性时间内判定字符串\(\large A\)是否为字符串\(\large B\)的子串,并求出\(\large A\)在\(\large B\)中各次出现的位置。
-
KMP算法分为两步:
①对字符串\(\large A\)进行自我匹配,求出一个数组\(\large next\),其中\(\large next[i]\)表示"\(A\)中以\(i\)结尾的非前缀子串"与"\(A\)的前缀"能够匹配的最长长度,即:\(\large next[i] = max\{j\}\),其中\(\large j < i\)并且\(\large A[i - j + 1 \sim i] = A[1 \sim j]\)
特别地,当不存在这样的\(\large j\)时,令\(\large next[i]\) = 0.
②对字符串\(\large A,B\)进行匹配,求出一个数组\(\large f\),其中\(\large f[i]\)表示"\(\large B\)中以\(\large i\)结尾的子串"与"\(\large A\)的前缀"能够匹配的最长长度,即:\(\large f[i] = max\{j\}\),其中\(\large j <= i\)并且\(\large B[i - j + 1 \sim i] = A[1 \sim j]\)
\(\large next\)数组的计算方法:
Ⅰ 初始化\(\large next[1] = j = 0\),假设\(\large next[1 \sim i - 1]\)已经求出,下面求解\(\large next[i]\).
Ⅱ 不断尝试扩展匹配长度\(\large j\),如果扩展失败(下一个字符不相等),令\(\large j = next[j]\),直至\(\large j = 0\)(应该重新从头开始匹配)。
Ⅲ 如果扩展成功,匹配长度\(\large j++\). \(\large next[i]\)的值就是\(\large j\).
next[1] = 0;
for (int i = 2,j = 0;i <= n;i++)
{
while (j > 0 && a[i] != a[j + 1])
j = next[j];
if (a[i] == a[j + 1])
j++;
next[i] = j;
}
\(\large f\)数组同理可得。
for (int i = 1,j = 0;i <= m;i++)
{
while (j > 0 && (j == n || b[i] != a[j + 1]))
j = next[j];
if (b[i] == a[j + 1])
j++;
f[i] = j;
//当A在B中某一次出现时,有: f[i] == n
}

浙公网安备 33010602011771号