KMP algorithm is often used in string matching, e.g., finding the common maximal prefix and surfix. The main issue is to maintain a next array, which will be used to find the backtracking position whenever a mis-match occurs. Given such an example, ABCDABCBA. We can mannully find the next array is [0,0,0,0,1,2,3,0,1]. Next[i] means the max common length of the prefix and surfix from position 0 to position i, including position i. The question is how to determine the array in a programmable way. 

Basically, we move the position i from 0 forward until the end. k denotes the ending position at which we have match until now, as shown below.

1 int[] next=new int[p.length()];
2 for(int i=1;i<p.length();i++){
k = next[i - 1];
3 while(k>0&&p.charAt(k)!=p.charAt(i)){ 4 k=next[k-1]; 5 } 6 next[i]=k==0?(p.charAt(k)==p.charAt(i)?1:0):k+1; 7 }

The understanding of this algorithm is that next[k-1] means the max common length ending at k-1. If there is no match at k, then k=next[k-1] means we are tracing back to the part of the array within [0, k - 1] for the most recent match ending at k - 1 because there is no match at k.  

E.g., if next[k-1]=2, it implies within [0,k-1] we have the common length of 2. Then k=2 means [0,k-1] have been matched and we only need to continue the match from k=2. One example is string BABADBABAB. When we are at last B, k = 4. It doesn't match D, Then k= next[k-1] means we are looking at one char back from pos k, which is pos k - 1 = 3. Next[3] = 2 means now we have 2 common values within [0, 3]. So we compare B with the position 2, since the first two letters BA must have been matched with the end of larger block "BABA" before last B. B = B, then BA can continue with the last B, constructing a larger match.  The final next = [0,0,1,2,0,1,2,3,4,3];

next[k-1]=0 means we have 0 match between [0,k-1], thus k=0 means we have to start from the first element. 

对于以上预处理,可以看到在for循环中,k总是<i. 每次for循环,k+1,而且k最大为len。在while循环中,k被减小,但始终非负。所以,while循环总体次数<len,那么在len次for循环中,while平均的复杂度为O(1). 所以,总体复杂度为O(m),m=len.

在matching的时候类似分析。

每一次执行while循环都会使k减小(但不能减成负的),而另外改变k值的地方只有第五行。每次执行了这一行,j都只能加1;因此,整个过程中j最多加了n个1。于是,j最多只有n次减小的机会(j值减小的次数当然不能超过n,因为j永远是非负整数)。这告诉我们,while循环总共最多执行了n次。按照摊还分析的说法,平摊到每次for循环中后,一次for循环的复杂度为O(1)。整个过程显然是O(n)的。这样的分析对于后面P数组预处理的过程同样有效,同样可以得到预处理过程的复杂度为O(m)。

The corresponding leetcode problems are 214. Shortest Palindrome.

 

posted on 2016-06-03 01:39  小玩子爱刷题  阅读(60)  评论(0)    收藏  举报