字符串哈希 - # CodeForces 1200E Compress Words

一、题目大意

给定 \(n\) 个字符串，按顺序将它们拼接。
拼接规则：设当前已拼接好的大字符串为 ans，下一个待拼接字符串为 s。
找到最大长度 \(L\)，满足：ans 的最后 \(L\) 个字符 = s 的前 \(L\) 个字符。
然后将 s 去掉前 \(L\) 个字符，把剩余部分拼接到 ans 末尾。
最终输出拼接完成的完整字符串。

数据范围：所有字符串总长不超过 \(10^6\)，需要线性/线性对数级算法。

二、解题思路

1. 核心问题

每次拼接时，求两个串的最长前后缀匹配长度：

A：已有字符串 ans 的后缀
B：新字符串 s 的前缀
要求找到最大的 \(L\)，使得两者长度为 \(L\) 的片段完全相等。

2. 算法选择：双多项式哈希

直接暴力逐字符比对会超时，使用字符串哈希快速判等：

对主串 ans 预处理前缀哈希数组 + 幂次数组，可 \(O(1)\) 取出任意后缀哈希；
对新串 s 采用滚动哈希，逐长度计算前缀哈希；

3. 哈希公式说明

本题字符串下标分为两种规范：

原字符串：0 下标（C++ string 原生）
前缀哈希数组 pre1/pre2：1 下标（哈希常规写法，pre[0]=0）

幂数组：\(pow[i] = base^i \bmod MOD\)，全局预处理
前缀哈希：
\[pre[i] = (pre[i-1] \times base + s[i-1]) \bmod MOD \]
取原串 \([l,r]\)（0下标，闭区间）哈希：
\[hash = \big(pre[r+1] - pre[l] \times pow[r-l+1] \big) \bmod MOD \]
加模数再取模保证结果非负。

4. 代码

#include <iostream>
#include <string>
#include <algorithm>
#define LL long long
using namespace std;

const int N = 1e6 + 10;
// 两组不同基底+模数，双哈希防碰撞
const LL base1 = 31;
const LL MOD1 = 1e9 + 7;
const LL base2 = 131;
const LL MOD2 = 1e9 + 9;

LL pow1[N], pow2[N];   // 幂数组: base^i mod MOD
LL pre1[N], pre2[N];   // 主串双前缀哈希数组
string ans;            // 最终拼接的答案串

// 全局预处理幂数组，只执行一次
void init() {
    pow1[0] = pow2[0] = 1;
    for (int i = 1; i < N; i++) {
        pow1[i] = pow1[i-1] * base1 % MOD1;
        pow2[i] = pow2[i-1] * base2 % MOD2;
    }
}

// 取原串 [l, r] 0下标区间的合并哈希值
LL get_hash(int l, int r) {
    // 第一组哈希
    LL h1 = (pre1[r+1] - pre1[l] * pow1[r-l+1] % MOD1 + MOD1) % MOD1;
    // 第二组哈希
    LL h2 = (pre2[r+1] - pre2[l] * pow2[r-l+1] % MOD2 + MOD2) % MOD2;
    // 合并为一个LL，简化判等
    return h1 * MOD1 + h2;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(0);

    int n;
    cin >> n >> ans;
    init();
    int len = ans.size();

    // 初始化第一个串的前缀哈希（1下标）
    for (int i = 1; i <= len; i++) {
        pre1[i] = (pre1[i-1] * base1 + ans[i-1]) % MOD1;
        pre2[i] = (pre2[i-1] * base2 + ans[i-1]) % MOD2;
    }

    // 处理剩余 n-1 个字符串
    for (int i = 2; i <= n; i++) {
        string s;
        cin >> s;
        int Ls = s.size();
        int maxLen = min(len, Ls);
        int mxL = 0;  // 记录最长匹配前后缀长度

        LL sh1 = 0, sh2 = 0;
        // 枚举所有可能的匹配长度
        for (int j = 1; j <= maxLen; j++) {
            // 滚动计算 s 长度为 j 的前缀哈希
            sh1 = (sh1 * base1 + s[j-1]) % MOD1;
            sh2 = (sh2 * base2 + s[j-1]) % MOD2;
            LL sh = sh1 * MOD1 + sh2;
            // 取 ans 末尾长度为 j 的后缀哈希
            LL ah = get_hash(len - j, len - 1);

            if (sh == ah)
                mxL = j; // 更新最大匹配长度
        }

        // 拼接：s 去掉前 mxL 个字符，接到 ans 后面
        ans += s.substr(mxL);

        // 同步更新主串的前缀哈希数组
        for (int j = mxL; j < Ls; j++) {
            len++;
            pre1[len] = (pre1[len-1] * base1 + s[j]) % MOD1;
            pre2[len] = (pre2[len-1] * base2 + s[j]) % MOD2;
        }
    }

    cout << ans << endl;
    return 0;
}

5. 复杂度分析

设所有字符串总长度为 \(T\ (\boldsymbol{T \le 10^6})\)

幂数组预处理：\(O(N)\)，\(N=10^6\)，仅执行一次；
哈希初始化、字符串拼接+哈希更新：整体 \(O(T)\)；
最长匹配枚举：每个字符最多被比对一次，总复杂度 \(O(T)\)。

整体时间复杂度：\(\boldsymbol{O(T)}\)，可以通过题目数据限制。

posted @ 2026-06-11 11:37 alice_ss 阅读(6) 评论(0) 收藏举报

刷新页面返回顶部

alice132