Living-Dream 系列笔记第57期

hash function（哈希函数）

将一个规模很大的字符串用特定规则转化为特定数值，

这种特定规则，我们称之为 hash function。
hash value（哈希值）

字符串由哈希函数生成的数值。
hash collision（哈希冲突）

多个字符串得到了相同的 hash value。
算法竞赛中的 hash function

通常将字符串转化为 \(x\) 进制的数。
hash value 过大的处理方法

取模 / 自然溢出。
hash table（哈希表 / 散列表）

字符串与其哈希值一一对应的表。
hash collision 的处理方法

双哈希（算法竞赛常用） / 链表（工程领域常用）
自然溢出

直接让数字溢出，从而达到取模的效果。

一般是 unsigned long long 自然溢出，此时数据溢出时会循环计数，即对 \(2^{64}\) 取模。
预处理字符串每个子串的 hash value

令 \(S\) 为字符串，\(X\) 为转化后的进制数，\(H_i\) 为前 \(i\) 个字符组成的字符串的 hash value，\(P_i\) 表示 \(X^i\)。

\[H_i=H_{i-1} \times X+S_i \]

\[P_i=P_i \times X \]

截取 \(S\) 在区间 \([l,r]\) 的 hash value

\[H_r-H_{l-1} \times P_{r-l+1} \]

删除 \(S\) 第 \(pos\) 个字符后的 hash value

\[H_{l,pos-1} \times P_{r-pos}+H_{pos+1,r} \]

注意 hash 的模数与进制尽量采用大质数，因为这样得出的 hash value 不容易两两间产生公因数，从而减少 hash collision 的发生。

T1

板子，没啥好讲的。

这里采用了三种写法：链表 / 双哈希 / STL map。

链表实现

#include<bits/stdc++.h>
using namespace std;
#define ull unsigned long long

const int N=1e6+5;
const int MOD=999983,BASE=26;
int n,ans=0;
string s;
vector<string> H[N];

int get_hash(string s){
	ull res=0;
	for(int i=0;i<s.size();i++) res=1ull*res*BASE+s[i];
	return res%MOD;
}

bool insert(string s){
	ull val=get_hash(s);
	for(auto i:H[val]) if(i==s) return 0;
	H[val].push_back(s); return 1;
}

int main(){
	cin>>n;
	while(n--){
		cin>>s;
		if(insert(s)) ans++;
	}
	cout<<ans;
	return 0;
}

双哈希实现

#include<bits/stdc++.h>
using namespace std;
#define ull unsigned long long

const int N=1e6+5;
int n,ans;
string s;
bool vis1[N],vis2[N];

ull get_hash(string s,int MOD,int BASE){
	ull res=0;
	for(int i=0;i<s.size();i++) res=res*BASE+s[i];
	return res%MOD;
}

int main(){
	cin>>n;
	while(n--){
		cin>>s;
		ull val1=get_hash(s,999983,26);
		ull val2=get_hash(s,333331,31);
		if((!vis1[val1])||(!vis2[val2]))
			ans++,vis1[val1]=vis2[val2]=1;
	}
	cout<<ans;
	return 0;
}

map 实现

#include<bits/stdc++.h>
#define int long long
using namespace std;

int n;
map<string,int> m;

signed main(){
    cin>>n;
    for(int i=1;i<=n;i++){
    	string s; cin>>s;
    	m[s]++;
	}
	cout<<m.size();
	return 0;
}

T2

https://www.luogu.com.cn/article/mtopu73t

作业 T1

枚举删除位置算出 hash value 并与字典中单词的 hash value 比对即可。

code

#include<bits/stdc++.h>
using namespace std;
#define ull unsigned long long

const int N=1e6+5;
const int BASE=13331;
string x,y;
int n,m;
ull hash1,H[N],P[N];
int tot,ans[N];

void init_hash(){
	P[0]=1;
	for(int i=1;i<=n;i++) P[i]=P[i-1]*BASE;
	for(int i=1;i<=n;i++) H[i]=H[i-1]*BASE+x[i];
	for(int i=1;i<=m;i++) hash1=hash1*BASE+y[i];
}
ull get_hash(int x,int y){
	if(y<x) return 0;
	return H[y]-H[x-1]*P[y-x+1];
}

int main(){
	cin>>x>>y;
	n=x.size(),m=y.size();
	x='#'+x,y='#'+y;
	init_hash();
	for(int i=1;i<=n;i++){
		ull pre_val=get_hash(1,i-1);
		ull last_val=get_hash(i+1,n);
		if(pre_val*P[n-i]+last_val==hash1) ans[++tot]=i;
	}
	cout<<tot<<'\n';
	for(int i=1;i<=tot;i++) cout<<ans[i]<<' ';
	return 0;
}

作业 T2

我们仍然枚举删除的位置 \(pos\)，并计算出删除后的 hash value \(val\)。

进行一个分讨：

若 \(pos < \frac{n+1}{2}\)（即 \(pos\) 在字符串的左半边），

由于 \(S\) 的长度为 \(\frac{n-1}{2}\)，所以 \(S\) 在左半边不完整，

只能在右半边的区间 \([\frac{n+3}{2},n]\)（\(\frac{n+3}{2}=n-\frac{n-1}{2}+1\)）取到。

若 \(pos = \frac{n+1}{2}\)（即 \(pos\) 在字符串的正中间），则 \(S\) 取左 / 右半边均可。

若 \(pos < \frac{n+1}{2}\)（即 \(pos\) 在字符串的左半边），

由于 \(S\) 的长度为 \(\frac{n-1}{2}\)，所以 \(S\) 在右半边不完整，

只能在左半边的区间 \([1,\frac{n-1}{2}]\) 取到。

我们由此可以得出 \(S\) 及其 hash value \(valS\)，

而两个 \(S\) 拼接在一起的 hash value 即为 \(valS \times P_{\frac{n-1}{2}}+valS\)，

因此我们比较上式与 \(val\)，若它们相等则找到了一个合法的 \(S\)。记录当前的 \(pos\)（为了循环结束后求答案，不能使用 substr，因为它的时间复杂度为 \(O(n)\)），累加答案个数，并标记 \(S\) 的 hash value（为了避免重复答案重复统计）。

循环结束后，按答案个数分别输出即可。

code

#include<bits/stdc++.h>
using namespace std;
#define ull unsigned long long

const int N=2e6+5;
const int BASE=999983;
int n,tot,mark;
ull H[N],P[N];
string u,ans;
map<ull,int> vis;

void init_hash(){
	P[0]=1;
	for(int i=1;i<=n;i++) P[i]=P[i-1]*BASE;
	for(int i=1;i<=n;i++) H[i]=H[i-1]*BASE+u[i];
}
ull get_hash(int l,int r){
	return H[r]-H[l-1]*P[r-l+1];
}

int main(){
    ios::sync_with_stdio(0);
    cin.tie(nullptr),cout.tie(nullptr);
	cin>>n>>u,u='#'+u;
	init_hash();
	for(int i=1;i<=n;i++){
		ull val=get_hash(1,i-1)*P[n-i]+get_hash(i+1,n);
		ull half_val=(i>(n+1)/2?get_hash(1,(n-1)/2):get_hash((n+3)/2,n));
		if(val==half_val*(P[(n-1)/2]+1)){
			if(vis[half_val]) continue;
			vis[half_val]=1,tot++,mark=i;
			if(tot>1) puts("NOT UNIQUE"),exit(0);
		}
	}
	if(tot>0) cout<<(mark>(n+1)/2?u.substr(1,(n-1)/2):u.substr((n+3)/2));
	else puts("NOT POSSIBLE");
	return 0;
}

posted @ 2024-05-18 23:12 _KidA 阅读(20) 评论(0) 收藏举报

刷新页面返回顶部

XOF.

Making all his nowhere blogs for nobody.

Living-Dream 系列笔记第57期

T1

T2

作业 T1

作业 T2

公告

XOF.

Making all his nowhere blogs for nobody.

Living-Dream 系列笔记 第57期

T1

T2

作业 T1

作业 T2

公告

Living-Dream 系列笔记第57期