最长公共字串和最长连续公共字串 以及最小编辑距离
最长公共字串+后缀数组(一篇文章)
注意事项:局部变量数组最多保存1M空间,而全局变量和静态变量可以保存更大
实现,很有很多变形
#include <iostream>
#include <stdio.h>
- #include <fstream>
- #include <string.h>
- #include <algorithm>
- using namespace std;
- int comp(const void * a1,const void *a2){
- return strcmp(*(char **)a1,*(char **)a2);
- }
- int com_len(const char * a1,const char *a2){
- int len = 0;
- int i = 0;
- while(a1[i]!='\0' && a2[i]!='\0'){
- if(a1[i]==a2[i]) {
- i++;
- len++;
- } else {
- break;
- }
- }
- return len;
- }
- int main(){
- static char a[10000000];
- static char* b[10000000];
- ifstream in("kjv.txt");
- int l=0;
- while(!in.eof()){
- a[l++]=in.get();
- b[l-1] = (char *)&a[l-1];
- if(l==9999999) break; //避免溢出
- }
- a[l]='\0';
- //sort(b,b+l,comp);//字符数组无法用sort排序
- qsort(b,l,sizeof(char *),comp); // int (*comp)(const void*,const void *)
- cout<<"sort"<<endl;
- int maxlen = 0;
- int index=0;
- for(int i=0;i<l-1;i++) {
- int clen = com_len(b[i],b[i+1]);
- if(clen>maxlen) {
- maxlen = clen;
- index = i;
- }
- }
- cout<<maxlen<<endl;
- b[index][maxlen]='\0';
- cout<<b[index]<<endl;
- cout<<b[index+1]<<endl;
- return 0;
- }
结果:
precious things, the silver, and the gold, and the spices, and the
precious ointment, and all the house of his armour, and all that was
found in his treasures: there was nothing in his house, nor in all his
dominion, that Hezekiah shewed them not.
两个字符串的最大公共子串,可以用动态规划实现:
D[i][j] 表示以s1中第i个结束,s2中以第j个结束的最大公共字串,可以说是前缀数组(以某个字符结束)
D[i][j] = D[i-1][j-1] + 1 ( s1[i-1] == s2[j-1]
D[i][j] = 0;
最长递增子序列
D[i] = max(s[i-1]>=s[j-1] ? D[j] + 1) j 满足 [1,i-1]
- #include <iostream>
- #include <vector>
- using namespace std;
- int longseq(const string & input){
- int sz = input.size();
- if(sz==0) return 0;
- int maxlen = 0;
- int index = 0;
- vector<int> flag(sz+1,0);
- for(int i=1;i<=sz;i++) {
- int value=1;
- for(int j=i-1;j>=1;j--) {
- if(input.at(j-1)<=input.at(i-1)){
- if(flag[j]+1>value) {
- value = flag[j]+1;
- }
- }
- }
- flag[i]=value;
- if(value>maxlen) {
- maxlen =value;
- index = i;
- }
- }
- cout<<index<<endl;
- cout<<maxlen<<endl;
- return maxlen;
- }
- int main(){
- string s="12345123679";
- longseq(s);
- return 0;
- }
最大连续子序列之和
- #include <iostream>
- #include <vector>
- #include <limits.h>
- using namespace std;
- int longsum(const vector<int> & input){
- int maxvalue = INT_MIN;
- int sum = 0;
- int begin=0;
- int end = 0;
- int newbegin=0;
- for(int i=0;i<input.size();i++) {
- sum += input[i];
- if(sum>maxvalue) {
- maxvalue = sum;
- end = i;
- begin = newbegin;
- }
- if(sum<0) {
- sum = 0;
- newbegin = i+1;
- }
- }
- for(int i=begin;i<=end;i++) cout<<input[i]<<endl;
- return maxvalue;
- }
- int main() {
- int a[]={-5,2,3,-9,4};
- vector<int> input(a,a+5);
- cout<<longsum(input)<<endl;
- }
http://www.cnblogs.com/grenet/archive/2010/06/03/1750454.html
最长公共字串,满足的动态规划方程:
为了讲解计算LCS(A,B),特给予以下几个定义
A=a1a2……aN,表示A是由a1a2……aN这N个字符组成,Len(A)=N
B=b1b2……bM,表示B是由b1b2……bM这M个字符组成,Len(B)=M
定义LCS(i,j)=LCS(a1a2……ai,b1b2……bj),其中0≤i≤N,0≤j≤M
故: LCS(N,M)=LCS(A,B)
LCS(0,0)=0
LCS(0,j)=0
LCS(i,0)=0
对于1≤i≤N,1≤j≤M,有公式一
若ai=bj,则LCS(i,j)=LCS(i-1,j-1)+1
若ai≠bj,则LCS(i,j)=Max(LCS(i-1,j-1),LCS(i-1,j),LCS(i,j-1)) LCS(i-1,j-1) <= MAX(LCS(i,j-1),LCS(i-1,j))
论证算法A的正确性(http://www.cnblogs.com/grenet/archive/2011/02/27/1959223.html)
To find L(i,j) ,let a common subsequence of that length be denoted by S(i,j)=c1c2……cp , If ai=bj, we can do no better than by taking cp=ai and looking for c1……cp-1 as a common subsequence of length L(i,j)-1 of string A1,i-1 and B1,j-1. Thus , in this case ,L(i,j)=L(i-1,j-1)+1
为了计算L(i,j),把长度和其相等的公共子序列定义为S(i,j)=c1c2……cp,如果ai=bj,则cp=ai,并且c1……cp-1是A1,i-1和B1,j-1的最长公共子序列,长度为L(i,j)-1。因此,在这种情况下,L(i,j)=L(i-1,j-1)+1
If ai≠bj ,then cp is ai,bj, or neither (but not both). If cp is ai , then a solution C to problem(A1i,B1j) [written P(i,j)] will be a solution to P(i,j-1) since bj is not used. Similarly , if cp is bj , then we can get a solution to P(i,j)by solving P(i-1,j). If cp is neither, then a solution to either P(i-1,j) or P(i,j-1) will suffice . In determining the length of the solution, it is seen that L(i,j) [corresponding to P(i,j)] will be the maximum of L(i-1,j) and L(i,j-1).
如果ai≠bj,则cp要么是ai,要么是bj,要么两者都不是(肯定不会都是)。如果cp=ai,因为bj不是C的元素,则求解C的问题(A1i,B1j)[写作P(i,j)]等同于求解P(i,j-1)。同样的,如果cp=bj,求解P(i,j)等同于求解P(i-1,j)。如果,cp两者都不是,则必是P(i-1,j)和P(i,j-1)中的一个。求解的长度称为L(i,j)[和P(i,j)相一致]将会是L(i-1,j) 和L(i,j-1)中的最大值。
编辑距离:
A=a1a2……aN,表示A是由a1a2……aN这N个字符组成,Len(A)=N
B=b1b2……bM,表示B是由b1b2……bM这M个字符组成,Len(B)=M
定义LD(i,j)=LD(a1a2……ai,b1b2……bj),其中0≤i≤N,0≤j≤M
故: LD(N,M)=LD(A,B)
LD(0,0)=0
LD(0,j)=j
LD(i,0)=i
对于1≤i≤N,1≤j≤M,有公式一
若ai=bj,则LD(i,j)=LD(i-1,j-1)
若ai≠bj,则LD(i,j)=Min(LD(i-1,j-1),LD(i-1,j),LD(i,j-1))+1
http://www.kngine.com/#Know!q=atoi (程序员搜索引擎)

利用该图进行推导。
http://www.cnblogs.com/SwordTao/p/3824980.html

浙公网安备 33010602011771号