KMP算法番外篇--求解next数组

KMP算法实现字符串的模式匹配的时间复杂度比朴素的模式匹配好很多，但是它时间效率的提高是有前提的，那就是：模式串的重复率很高，不然它的效率也不会凸显出来。在实际的应用中，KMP算法不算是使用率很高的一个算法，但是它的核心的那点东西却是使用率很高的，那就是next前缀数组的求解思路。在这次笔记中就单独摘出来，说一下前缀数组的求解。

1. next前缀数组的定义

不管做题还是推到算法，永远记住定义，这时最重要的东西。

2. next数组的暴力求解

这种方法的主要思路是：

为了求解n_j的值，把的所有的前缀和后缀都找出来，然后从最大的开始匹配，直到找到合适的最长公共前缀后缀。如果没有，那么n_j的值就是0。

前后缀的选取方式：

暴力算法就是在这里面不断的从最大的那个前缀和后缀逐一的匹配。

算法描述：

(1) 根据定义，初始化n[0] = –1。

(2) 从模式串的下标为1的位置，依次遍历整个模式串。对于每一个字符，当到达其下标j时，令k=j-1。

(3) 寻找它前面的字符串的最大公共前缀后缀，也就是判断的真假？

(4) 如果满足条件，令next[j]=k；如果不满足条件k--，继续执行(3)的步骤，直到k==0，然后令next[j]= 0。

代码实现：

#include <iostream>
#include <stdlib.h>
bool IsPatternMatch(char *p, int compareNum, int totalNum);
void ViolentGetNext(char *p, int *next);

void main()
{
    int next[20];
    char *str = "agctagcagctagctg";
    ViolentGetNext(str, next);

    system("pause");
}

void ViolentGetNext(char *p, int *next)
{
    int pLen = strlen(p);
    int k = 0;
    next[0] = -1;

    for(int j = 1; j < pLen; j++)
    {
        k = j - 1;
        while(k > 0)
        {
            if(IsPatternMatch(p, k , j))
                break;
            else
                k--;
        }// while

        next[j] = k;
    }// for
}
//param:copareNum代表了要比较的字节数
//param:totalNum代表了要比较的字节数
//上面的两个参数的作用就是定界前缀和后缀可能的范围
bool IsPatternMatch(char *p, int compareNum, int totalNum)
{
    int i = 0;
    int j = totalNum - compareNum;

    for(; i < compareNum; i++, j++)
    {
        if(p[i] != p[j])
        {
            return false;
        }
    }

    return true;
}

具体的例子，假设字符串为ABCDABD
n[5]的求解过程如下：
k=4 
ABCD≠BCDA,k=3 
ABC≠CDA,k=2 
AB≠DA,k=1 
A==A,n[5]=k

3. next数组的递归求解

暴力求解每次在计算next[j]的时候都是独立的，而实际上求解next[j+1]是可以利用到next[0…j]的，这里的递归算法就是这样实现的。

设模式串为，现在已经计算出了next[0…j]，如何计算next[j+1]？

利用前面求解的数值（这也是算法改进的地方，不让每个next元素都独立的计算），若已知next[j]=k，则对于模式串，肯定有这样的关系：

所以算法的描述可以是这样的：

(1) 如果k==-1（只有第一个字符的next值是-1），说明现在的位置是第二个位置，还不能算第二个它本身，所以next[j+1]=0，算法结束。

(2) 如果，理解这里的k是怎么从next[j]的值转换到了字符的下标值。则next[j+1]=k+1，算法结束。

提示：前面有分析过，求解next数组的过程的快捷方法就是不让他们独立的计算，还是继承前面计算好了的对称性。知道了next[j]的对称性，只需要在考察一下前缀和后缀的下一个字符是否相等就可以了。p_k和p_j就是之前最长前缀和后缀的下一个字符。

(3) 那么这个k’从哪里来的呢？看这个式子的两端就知道k’=next[k]。理解好上面的这个式子，就知道k‘是怎么来的了。

(4) 将k’赋值给k，转到步骤(1)。

代码实现：

//the recursion method to abtain the next array
//pLen is the length of the string
void RecursionGetNext(char *p, int pLen, int *next)
{

    if(pLen == 1)
    {
        next[pLen - 1] = -1;
        return;
    }

    RecursionGetNext(p, pLen - 1, next);

    //pLen represents the number of the string
    //pLen - 1 represents the index of the last character,that is the character that will be calculated in the next array.
    //pLen - 1 - 1 represents the index of the sub-last character that has been calculated in the next array.
    int k = next[pLen - 2];

    //k==-1 is a label showing that there is no prefix matching with postfix and the currently added character can not match neither.
    //k==0 can only show that there is no prefix mathching with postfix,but pk may be match with pj
    while(k >= 0)
    {
        if(p[pLen-2] == p[k])
        {
            break;
        }
        else
        {
            k = next[k];
        }
    }//while

    next[pLen -1] = k + 1;

}//RecursionGetNext()

4. next数组的递归展开求解

void GetNext(char *p, int *next)
{
    int pLen = strlen(p);
    int j = 0;
    int k = -1;
    next[0] = -1;

    while(j < pLen - 1)
    {
        //accroding to the depiction of the algorithm,the procedure can be programmed below:
        //if(k == -1)
        //{
        //    ++j;
        //    ++k;
        //    next[j] = k;
        //}
        //else if(p[j] == p[k])
        //{
        //    ++j;
        //    ++k;
        //    next[j] = k;
        //}
        //but the fist two procedure can be reduced to one case:

        //p[j] == p[k] shows that we can inherite the feature of the string that matched alreay
        //k==-1 shows two circumstance: 1.the beginning of the algorithm 2.there is no matched prefix and postfix and the last character is also defferent with the first one
        if(k == -1 || p[j] == p[k])
        {
            ++j;
            ++k;
            next[j] = k;
        }
        else
        {
            k = next[k];
        }
    }//while
}

5. Reference Link

@wzhang1117的博客：https://www.zybuluo.com/wzhang1117/note/27431

posted @ 2015-03-20 17:30 stemon 阅读(594) 评论(0) 收藏举报

刷新页面返回顶部

要知道，春天的道路依然充满泥泞！