一种空间复杂度为$O(\log n)$的线性字符串匹配算法

记号

\(\empty\)为空字符串序列

\(Q\)为所有字符串序列构成的集合

对于\(A\in Q\),记\(|A|\)为其中字符串个数

对于\(\begin{cases}1\le l\le r\le n\\ A=\{a_{1},a_{2},...,a_{n}\}\in Q \end{cases}\),记\(\begin{cases}\sum_{i=l}^{r}a_{i}=a_{l}a_{l+1}...a_{r}\\A_{[l,r]}=\{a_{l},a_{l+1},...,a_{r}\}\end{cases}\)


皮配函数

定义辅助函数\(g:Q\rightarrow Q\),对于\(A=\{a_{1},a_{2},...,a_{n}\}\in Q\)

  • \(A=\empty\)\(a_{1}=a_{2}=...=a_{n}\),则\(g(A)=\empty\)

  • 否则,取最小的\(i\in [1,n]\)满足\(a_{i}\ne a_{1}\),取\(j\in [1,n-i+1]\)满足\(A_{[1,i]}=A_{[j,j+i-1]}\)

    设依次为\(b_{1}<b_{2}<...<b_{k}\),则\(g(A)=\{\sum_{j=b_{1}}^{b_{2}-1}a_{j},\sum_{j=b_{2}}^{b_{3}-1}a_{j},...,\sum_{j=b_{k-1}}^{b_{k}-1}a_{j}\}\)

性质1:\(|g(A)|\le \lfloor\frac{|A|}{2}\rfloor\)

(参考\(g\)中的定义)

  • \(A=\empty\)\(a_{1}=a_{2}=...=a_{n}\)时显然成立

  • 否则,若\(\exists j\in [1,k),b_{j}+i>b_{j+1}\),则\(a_{i}=a_{b_{j}+i-1}=a_{1}\),矛盾

    换言之,有\(\forall j\in [1,k),b_{j}+i\le b_{j+1}\),进而\((k-1)i+1\le b_{k}\le n-i+1\)

    化简得\(k\le \lfloor\frac{n}{i}\rfloor\),显然\(i\ge 2\),即\(|g(A)|=k-1<\lfloor\frac{n}{2}\rfloor\)

定义皮配集合\(\Phi=\{\varnothing\}\cup \{(X,Y,P_{z})\mid X,Y\in Q,P_{z}\in \Phi\}\)

对于\(P\in \Phi\),记\(|P|=\begin{cases}0&P=\varnothing\\|P_{z}|+1&P=(X,Y,P_{z})\end{cases}\)

定义皮配函数\(f:Q\rightarrow \Phi\),对于\(A=\{a_{1},a_{2},...,a_{n}\}\in Q\)

  • \(A=\empty\),则\(f(A)=\varnothing\)
  • 否则,若\(a_{1}=a_{2}=...=a_{n}\),则\(f(A)=(A,A,\varnothing)\)
  • 否则,(参考\(g\)中的定义)\(f(A)=(A_{[1,i]},A_{[b_{k},n]},f(g(A)))\)

性质2:\(|f(A)|\le \begin{cases}0&A=\empty\\ \lfloor\log_{2}|A|\rfloor+1&A\ne \empty\end{cases}\)

  • \(A=\empty\)\(A\)中字符串均相同时显然成立
  • 否则,对\(|A|\)从小到大归纳,当\(g(A)=\empty\)时显然成立
  • 否则,根据归纳假设及性质1,有\(|f(A)|\le \lfloor\log_{2}|g(A)|\rfloor+2\le \lfloor\log_{2}|A|\rfloor+1\)

增量法

已知\(f(A)\),求\(f(A+\{s\})\)

  • \(f(A)=\varnothing\),则\(f(A+\{s\})=(\{s\},\{s\},\varnothing)\)

  • 否则,设\(f(A)=(X,Y,P_{z})\),其中\(Y=\{y_{1},y_{2},...,y_{n}\}\)

    \(X\)中字符串均相同,则\(f(A+\{s\})=(X+\{s\},Y+\{s\},P_{z})\)

  • 否则,若\(X\)不为\(Y+\{s\}\)的后缀,则\(f(A+\{s\})=(X,Y+\{s\},P_{z})\)

  • 否则,\(f(A+\{s\})=(X,X,f(g(A)+\{\sum_{i=1}^{n-|X|+1}y_{i}\}))\)

    后者即"已知\(f(g(A))\),求\(f(g(A)+\{\sum_{i=1}^{n-|X|+1}y_{i}\})\)"的子问题,可以递归处理

\(T(A,s)\)为上述过程的递归次数(包括初始加入)

定义势能函数\(\varphi:Q\rightarrow N\),其中\(\varphi(A)=\begin{cases}0&A=\empty\\ |A|+\varphi(g(A))&A\ne \empty\end{cases}\)

性质3:\(T(A,s)=\varphi(A+\{s\})-\varphi(A)\)

(参考增量法过程中的定义)

  • \(|A|\)从小到大归纳,当不进行递归时显然成立

  • 否则,记\(s'=\sum_{i=1}^{n-|X|+1}y_{i}\),则\(g(A)+\{s'\}=g(A+\{s\})\),根据归纳假设

    \[T(A,s)=\varphi(g(A)+\{s'\})-\varphi(g(A))+1=\varphi(A+\{s\})-\varphi(A) \]

换言之,对于\(A=\{a_{1},a_{2},...,a_{n}\}\in Q\),增量法求\(f(A)\)的总递归次数即

\[\sum_{i=1}^{n}T(A_{[1,i-1]},a_{i})=\sum_{i=1}^{n}\varphi(A_{[1,i]})-\varphi(A_{[1,i-1]})=\varphi(A) \]

性质4:\(\varphi(A)\le 2|A|\)

  • \(|A|\)从小到大归纳,当\(A=\empty\)时显然成立
  • 否则,根据归纳假设及性质1,有\(\varphi(A)=|A|+2|g(A)|\le 2|A|\)

综上,增量法求\(f(A)\)的总递归次数(均摊)为\(O(n)\)


皮配结论

定义\(\Phi\)上的皮配关系\(\xi\subseteq\Phi\times \Phi\),对于\(P_{s},P_{t}\in \Phi\)

  • \(P_{s}=\varnothing\),则\((P_{s},P_{t})\in \xi\)

  • 否则,若\(P_{t}=\varnothing\),则\((P_{s},P_{t})\not\in \xi\)

  • 否则,设\(\begin{cases}P_{s}=(X_{s},Y_{s},P_{zs})\\P_{t}=(X_{t},Y_{t},P_{zt})\end{cases}\)

    \(X_{s}\)中字符串均相同,则\((P_{s},P_{t})\in \xi\iff X_{s}\)\(Y_{t}\)的后缀

  • 否则,\((P_{s},P_{t})\in \xi \iff Y_{s}=Y_{t}\)\((P_{zs},P_{zt})\in \xi\)

皮配结论:\(|S|\le |T|\),则\(S\)\(T\)的后缀\(\iff (f(S),f(S+T))\in \xi\)

\(\begin{cases}S=\{s_{1},s_{2},...,s_{n}\}\\T=\{t_{1},t_{2},...,t_{m}\}\end{cases},\begin{cases}P_{s}=f(S)=(X_{s},Y_{s},P_{zs})\\P_{t}=f(S+T)=(X_{t},Y_{t},P_{zt})\end{cases}\)

  • \(P_{s}=\varnothing\)\(P_{t}=\varnothing\)时显然成立

  • 否则,有\(|Y_{t}|\ge |X_{s}|\)且为\(S+T\)的后缀

    \(X_{s}\)中字符串均相同时,有\(S=X_{s}\),进而显然成立

  • 否则,对\(|S|\)从小到大归纳

    \(S\)\(T\)的后缀,则\(X_{s}=S_{[j,j+|X_{s}|-1]}=(S+T)_{[j+m,j+m+|X_{s}|-1]}\),即出现位置相同

    换言之,需\(Y_{s}=Y_{t}\)\(g(S)\)\(g(S+T)\)的后缀,由归纳假设后者即\((P_{zs},P_{zt})\in \xi\)

    另一方面,若满足上述条件,显然\(S\)\(T\)的后缀


字符串匹配

将给定字符串转换为字符串序列(每个元素长度为\(1\)),设为\(\begin{cases}S=\{s_{1},s_{2},...,s_{n}\}\\T=\{t_{1},t_{2},...,t_{m}\}\end{cases}\)

根据结论,用增量法求出\(f(S)\)\(f(S+T[1,i])\),并根据\(\xi\)的定义判定即可

在实现中,需\(O(1)\)实现增量法的每层递归,具体细节如下——

  • 字符串以哈希的形式存储,即可\(O(1)\)拼接和判断是否相同

  • 对于\(f(A)=(X,Y,P_{z})\),设\(\begin{cases}X=\{x_{1},x_{2},...,x_{n}\}\\Y=\{y_{1},y_{2},...,y_{m}\}\end{cases}\)

    \(X\)满足\(x_{1}=x_{2}=...=x_{n-1}\),仅需维护\(n,x_{1}\)\(x_{n}\)即可

    \(Y\)需判定\(X\)是否为\(Y+\{s\}\)的后缀,并求\(\sum_{i=1}^{m-n+1}y_{i}\)(若成立)

    维护\(Y\)末尾\(s_{1}\)的个数\(c\)(对\(n-1\)\(\min\))和\(\sum_{i=1}^{m}y_{i}\),判定即\(c\ge n-1\)\(s=x_{n}\)

    另外,为了避免减法,可以额外维护\(\sum_{i=1}^{m-c}y_{i}\)

  • 由于\(f(S)\)固定,\(\xi\)的判定可以转换为若干个条件,在增量时维护满足的条件数即可

综上,时间复杂度为\(O(n+m)\),空间复杂度为\(O(\log (n+m))\)

#include<bits/stdc++.h>
using namespace std;
typedef long long ll;
const int N=30,B=47,mod=998244353;
int n,m,t,cnt,ans;
struct String{
	int s,H;
	String(){
		s=1,H=0;
	}
	String(int c){
		s=B,H=c;
	}
	String(int _s,int _H){
		s=_s,H=_H;
	}
	bool operator == (const String &n)const{
		return (s==n.s)&&(H==n.H);
	}
	String operator + (const String &n)const{
		return String((ll)s*n.s%mod,(H+(ll)s*n.H)%mod);
	}
};
struct Seqx{
	bool flag;int c;String a,b;
	Seqx(){}
	Seqx(String &s){
		flag=0,c=1,a=s,b=String();
	}
	void add(String &s){
		if (a==s)c++;
		else flag=1,b=s;
	}
}X[N];
struct Seqy{
	int c;String a,b,a0;
	Seqy(){}
	Seqy(String &s){
		c=1,a=s,b=a0=String();
	}
	bool add(String &s,Seqx &x){
		a=a+s;
		if (!x.flag){
			if (s==x.a)c++;
			else c=0,b=a0=a;
		}
		else{
			if (s==x.a){
				if (c<x.c)c++;
				else b=b+s;
			}
			else{
				if ((s==x.b)&&(c==x.c))return 1;
				c=0,b=a;
			}
		}
		return 0;
	}
}Y[N];
struct Data{
	Seqx x;Seqy y;Data *z;
}*Ps,*Pt;
char Getchar(){
	char c=getchar();
	while ((c<'a')||(c>'z'))c=getchar();
	return c;
}
bool check(int d,Seqy &y){
	return (X[d].flag ? Y[d].a==y.a : X[d].c<=y.c);
}
void add(Data **P,String s,int d,int p){
	if ((p)&&(d>=t))return;
	if ((p)&&(!check(d,(*P)->y)))cnt--;
	if ((*P)==NULL)(*P)=new Data{Seqx(s),Seqy(s),NULL};
	else{
		if ((*P)->y.add(s,(*P)->x)){
			add(&(*P)->z,(*P)->y.b,d+1,p);
			(*P)->y.c=0,(*P)->y.a=(*P)->y.b=(*P)->y.a0;
		}
		if (!(*P)->x.flag)(*P)->x.add(s);
	}
	if ((p)&&(!check(d,(*P)->y)))cnt++;
}
void get(Data *P){
	if (P==NULL)return;
	X[t]=P->x,Y[t++]=P->y,get(P->z);
}
void get(Data **P,int d){
	if (d==t)return;
	(*P)=new Data{X[d],Y[d],NULL},get(&(*P)->z,d+1);
}
int main(){
	scanf("%d%d",&n,&m);
	for(int i=1;i<=n;i++){
		int c=Getchar()-'a'+1;
		add(&Ps,String(B,c),0,0);
	}
	t=cnt=0,get(Ps),get(&Pt,0);
	for(int i=1;i<=m;i++){
		int c=Getchar()-'a'+1;
		add(&Pt,String(B,c),0,1);
		if ((i>=n)&&(!cnt))ans++;
	}
	printf("%d\n",ans);
	return 0;
}
posted @ 2023-02-10 21:56  PYWBKTDA  阅读(272)  评论(0编辑  收藏  举报