最长公共字串和最长连续公共字串以及最小编辑距离

最长公共字串+后缀数组（一篇文章）

注意事项：局部变量数组最多保存1M空间，而全局变量和静态变量可以保存更大

实现，很有很多变形

#include <iostream>

#include <stdio.h>

#include <fstream>
#include <string.h>
#include <algorithm>
using namespace std;
int comp(const void * a1,const void *a2){
return strcmp(*(char **)a1,*(char **)a2);
}
int com_len(const char * a1,const char *a2){
int len = 0;
int i = 0;
while(a1[i]!='\0' && a2[i]!='\0'){
if(a1[i]==a2[i]) {
i++;
len++;
} else {
break;
}
}
return len;
}
int main(){
static char a[10000000];
static char* b[10000000];
ifstream in("kjv.txt");
int l=0;
while(!in.eof()){
a[l++]=in.get();
b[l-1] = (char *)&a[l-1];
if(l==9999999) break; //避免溢出
}
a[l]='\0';
//sort(b,b+l,comp);//字符数组无法用sort排序
qsort(b,l,sizeof(char *),comp); // int (*comp)(const void*,const void *)
cout<<"sort"<<endl;
int maxlen = 0;
int index=0;
for(int i=0;i<l-1;i++) {
int clen = com_len(b[i],b[i+1]);
if(clen>maxlen) {
maxlen = clen;
index = i;
}
}
cout<<maxlen<<endl;
b[index][maxlen]='\0';
cout<<b[index]<<endl;
cout<<b[index+1]<<endl;
return 0;
}

结果：

precious things, the silver, and the gold, and the spices, and the

precious ointment, and all the house of his armour, and all that was

found in his treasures: there was nothing in his house, nor in all his

dominion, that Hezekiah shewed them not.

两个字符串的最大公共子串，可以用动态规划实现：

D[i][j] 表示以s1中第i个结束，s2中以第j个结束的最大公共字串，可以说是前缀数组（以某个字符结束）

D[i][j] = D[i-1][j-1] + 1 ( s1[i-1] == s2[j-1]

D[i][j] = 0;

最长递增子序列

D[i] = max(s[i-1]>=s[j-1] ? D[j] + 1) j 满足 [1,i-1]

#include <iostream>
#include <vector>
using namespace std;
int longseq(const string & input){
int sz = input.size();
if(sz==0) return 0;
int maxlen = 0;
int index = 0;
vector<int> flag(sz+1,0);
for(int i=1;i<=sz;i++) {
int value=1;
for(int j=i-1;j>=1;j--) {
if(input.at(j-1)<=input.at(i-1)){
if(flag[j]+1>value) {
value = flag[j]+1;
}
}
}
flag[i]=value;
if(value>maxlen) {
maxlen =value;
index = i;
}
}
cout<<index<<endl;
cout<<maxlen<<endl;
return maxlen;
}
int main(){
string s="12345123679";
longseq(s);
return 0;
}

最大连续子序列之和

#include <iostream>
#include <vector>
#include <limits.h>
using namespace std;
int longsum(const vector<int> & input){
int maxvalue = INT_MIN;
int sum = 0;
int begin=0;
int end = 0;
int newbegin=0;
for(int i=0;i<input.size();i++) {
sum += input[i];
if(sum>maxvalue) {
maxvalue = sum;
end = i;
begin = newbegin;
}
if(sum<0) {
sum = 0;
newbegin = i+1;
}
}
for(int i=begin;i<=end;i++) cout<<input[i]<<endl;
return maxvalue;
}
int main() {
int a[]={-5,2,3,-9,4};
vector<int> input(a,a+5);
cout<<longsum(input)<<endl;
}

http://www.cnblogs.com/grenet/archive/2010/06/03/1750454.html

最长公共字串，满足的动态规划方程：

为了讲解计算LCS(A,B)，特给予以下几个定义

　　A=a₁a₂……a_N，表示A是由a₁a₂……a_N这N个字符组成，Len(A)=N

　　B=b₁b₂……b_M，表示B是由b₁b₂……b_M这M个字符组成，Len(B)=M

　　定义LCS(i,j)=LCS(a₁a₂……a_i,b₁b₂……b_j)，其中0≤i≤N，0≤j≤M

　　故：　　LCS(N,M)=LCS(A,B)

　　　　　　LCS(0,0)=0

　　　　　　LCS(0,j)=0

　　　　　　LCS(i,0)=0

　　对于1≤i≤N，1≤j≤M，有公式一

　　若a_i=b_j，则LCS(i,j)=LCS(i-1,j-1)+1

　　若a_i≠b_j，则LCS(i,j)=Max(LCS(i-1,j-1),LCS(i-1,j),LCS(i,j-1)) LCS(i-1,j-1) <= MAX(LCS(i,j-1),LCS(i-1,j))

论证算法A的正确性(http://www.cnblogs.com/grenet/archive/2011/02/27/1959223.html)

To find L(i,j) ,let a common subsequence of that length be denoted by S(i,j)=c₁c₂……c_p , If a_i=b_j, we can do no better than by taking c_p=a_i and looking for c₁……c_p-1 as a common subsequence of length L(i,j)-1 of string A_1,i-1 and B_1,j-1. Thus , in this case ,L(i,j)=L(i-1,j-1)+1

为了计算L(i,j)，把长度和其相等的公共子序列定义为S(i,j)=c₁c₂……c_p，如果a_i=b_j，则c_p=a_i，并且c₁……c_p-1是A_1,i-1和B_1,j-1的最长公共子序列，长度为L(i,j)-1。因此，在这种情况下，L(i,j)=L(i-1,j-1)+1

If a_i≠b_j ,then c_p is a_i,b_j, or neither (but not both). If c_p is a_i , then a solution C to problem(A_1i,B_1j) [written P(i,j)] will be a solution to P(i,j-1) since b_j is not used. Similarly , if c_p is b_j, then we can get a solution to P(i,j)by solving P(i-1,j). If c_p is neither, then a solution to either P(i-1,j) or P(i,j-1) will suffice . In determining the length of the solution, it is seen that L(i,j) [corresponding to P(i,j)] will be the maximum of L(i-1,j) and L(i,j-1).

如果a_i≠b_j，则c_p要么是a_i，要么是b_j，要么两者都不是（肯定不会都是）。如果c_p=a_i，因为b_j不是C的元素，则求解C的问题(A_1i,B_1j)[写作P(i,j)]等同于求解P(i,j-1)。同样的，如果c_p=b_j，求解P(i,j)等同于求解P(i-1,j)。如果，c_p两者都不是，则必是P(i-1,j)和P(i,j-1)中的一个。求解的长度称为L(i,j)[和P(i,j)相一致]将会是L(i-1,j) 和L(i,j-1)中的最大值。

编辑距离：

A=a₁a₂……a_N，表示A是由a₁a₂……a_N这N个字符组成，Len(A)=N

　　B=b₁b₂……b_M，表示B是由b₁b₂……b_M这M个字符组成，Len(B)=M

　　定义LD(i,j)=LD(a₁a₂……a_i,b₁b₂……b_j)，其中0≤i≤N，0≤j≤M

　　故：　　LD(N,M)=LD(A,B)

　　　　　　LD(0,0)=0

　　　　　　LD(0,j)=j

　　　　　　LD(i,0)=i

　　对于1≤i≤N，1≤j≤M，有公式一

　　若a_i=b_j，则LD(i,j)=LD(i-1,j-1)

　　若a_i≠b_j，则LD(i,j)=Min(LD(i-1,j-1),LD(i-1,j),LD(i,j-1))+1

http://www.kngine.com/#Know!q=atoi （程序员搜索引擎）

利用该图进行推导。

http://www.cnblogs.com/SwordTao/p/3824980.html

posted @ 2014-08-20 12:32 purejade 阅读(291) 评论(0) 收藏举报

刷新页面返回顶部

purejade

最长公共字串和最长连续公共字串 以及最小编辑距离

公告

最长公共字串和最长连续公共字串以及最小编辑距离