8-3 最长公共子序列
最长公共子序列(Longest Common Subsequence, LCS)
最长公共子序列(Longest Common Subsequence, LCS)是计算机科学中的经典问题:给定两个序列,找出它们共有的最长子序列。
子序列(Subsequence) 是指从原序列中删除零个或多个元素后,剩余元素保持原有相对顺序所构成的序列。子序列不要求元素在原序列中连续,但要求相对顺序不变。
子串(Substring) 则要求元素在原序列中必须连续出现。
举例区分:
字符串 X = "ABCDE"
子序列: "ACE" -- 删除 B、D,保持 A→C→E 的相对顺序
子序列: "ACD" -- 删除 B、E
子串: "ABC" -- 连续的
子串: "CDE" -- 连续的
"ACE" 不是子串,因为 A、C、E 在原串中不连续
LCS 的经典示例:
- X = "ABCDGH",Y = "AEDFHR",LCS = "ADH",长度 = 3
- X = "AGGTAB",Y = "GXTXAYB",LCS = "GTAB",长度 = 4
LCS 是典型的动态规划(Dynamic Programming, DP) 问题。下面从递归暴力搜索出发,逐步推导出高效的动态规划解法。
递归法(暴力搜索)
递归法基于 LCS 的最优子结构性质:
- 如果两个序列的最后一个字符相同:那么该字符一定属于 LCS 的一部分。问题转化为求去掉最后一个字符后两个子序列的 LCS,再加 1。
- 如果最后一个字符不同:那么 LCS 取以下两种情况的较大值——去掉 X 的最后一个字符后求 LCS,或去掉 Y 的最后一个字符后求 LCS。
递推公式:
LCS(X[0..m-1], Y[0..n-1]):
if m == 0 or n == 0:
return 0
if X[m-1] == Y[n-1]:
return 1 + LCS(X[0..m-2], Y[0..n-2])
else:
return max(LCS(X[0..m-2], Y[0..n-1]), LCS(X[0..m-1], Y[0..n-2]))
以 "ABCDGH" 和 "AEDFHR" 为例,递归树的部分展开:
LCS("ABCDGH", "AEDFHR") -- H != R
├── LCS("ABCDG", "AEDFHR") -- G != R
│ ├── LCS("ABCD", "AEDFHR") -- D == D!
│ │ └── 1 + LCS("ABC", "AEDFHR") → ...
│ └── LCS("ABCDG", "AEDFH") -- ...
└── LCS("ABCDGH", "AEDFH") -- H == H!
└── 1 + LCS("ABCDG", "AEDF") → ...
递归法的时间复杂度为 O(2^(m+n)),指数级别,存在大量重复子问题,效率极低。
C++ 实现
#include <iostream>
#include <string>
#include <algorithm>
int lcs(const std::string& X, const std::string& Y, int m, int n)
{
// base case: either string is empty
if (m == 0 || n == 0)
{
return 0;
}
// last chars match
if (X[m - 1] == Y[n - 1])
{
return 1 + lcs(X, Y, m - 1, n - 1);
}
// last chars differ: take max of two sub-problems
return std::max(lcs(X, Y, m - 1, n), lcs(X, Y, m, n - 1));
}
int main()
{
std::string X = "ABCDGH";
std::string Y = "AEDFHR";
int result = lcs(X, Y, X.length(), Y.length());
std::cout << "X = \"" << X << "\"\n";
std::cout << "Y = \"" << Y << "\"\n";
std::cout << "LCS length = " << result << "\n";
X = "AGGTAB";
Y = "GXTXAYB";
result = lcs(X, Y, X.length(), Y.length());
std::cout << "X = \"" << X << "\"\n";
std::cout << "Y = \"" << Y << "\"\n";
std::cout << "LCS length = " << result << "\n";
return 0;
}
运行该程序将输出
X = "ABCDGH"
Y = "AEDFHR"
LCS length = 3
X = "AGGTAB"
Y = "GXTXAYB"
LCS length = 4
Python 实现
def lcs(X, Y, m, n):
# base case: either string is empty
if m == 0 or n == 0:
return 0
# last chars match
if X[m - 1] == Y[n - 1]:
return 1 + lcs(X, Y, m - 1, n - 1)
# last chars differ: take max of two sub-problems
return max(lcs(X, Y, m - 1, n), lcs(X, Y, m, n - 1))
X = "ABCDGH"
Y = "AEDFHR"
print(f'X = "{X}"')
print(f'Y = "{Y}"')
print(f"LCS length = {lcs(X, Y, len(X), len(Y))}")
X = "AGGTAB"
Y = "GXTXAYB"
print(f'X = "{X}"')
print(f'Y = "{Y}"')
print(f"LCS length = {lcs(X, Y, len(X), len(Y))}")
运行该程序将输出
X = "ABCDGH"
Y = "AEDFHR"
LCS length = 3
X = "AGGTAB"
Y = "GXTXAYB"
LCS length = 4
Go 实现
package main
import (
"fmt"
)
func lcs(X, Y string, m, n int) int {
// base case: either string is empty
if m == 0 || n == 0 {
return 0
}
// last chars match
if X[m-1] == Y[n-1] {
return 1 + lcs(X, Y, m-1, n-1)
}
// last chars differ: take max of two sub-problems
left := lcs(X, Y, m-1, n)
right := lcs(X, Y, m, n-1)
if left > right {
return left
}
return right
}
func main() {
X := "ABCDGH"
Y := "AEDFHR"
fmt.Printf("X = \"%s\"\n", X)
fmt.Printf("Y = \"%s\"\n", Y)
fmt.Printf("LCS length = %d\n", lcs(X, Y, len(X), len(Y)))
X = "AGGTAB"
Y = "GXTXAYB"
fmt.Printf("X = \"%s\"\n", X)
fmt.Printf("Y = \"%s\"\n", Y)
fmt.Printf("LCS length = %d\n", lcs(X, Y, len(X), len(Y)))
}
运行该程序将输出
X = "ABCDGH"
Y = "AEDFHR"
LCS length = 3
X = "AGGTAB"
Y = "GXTXAYB"
LCS length = 4
递归法虽然直观易懂,但时间复杂度为 O(2^(m+n)),存在大量重复计算。对于较长的字符串,运行时间会呈指数增长。Go 标准库没有内置 max 函数用于整数(Go 1.21 之前),因此需要手动比较取较大值。
动态规划法 — 计算LCS长度
动态规划(Dynamic Programming, DP)通过用表格存储子问题的解,避免重复计算,将指数级复杂度降为多项式级。
定义 dp[i][j] 表示序列 X[0..i-1] 与 Y[0..j-1] 的 LCS 长度。状态转移方程为:
if i == 0 or j == 0:
dp[i][j] = 0
elif X[i-1] == Y[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
以 X = "ABCDGH",Y = "AEDFHR" 为例,逐步填充 dp 表(m=6, n=6):
"" A E D F H R
"" 0 0 0 0 0 0 0
A 0 1 1 1 1 1 1
B 0 1 1 1 1 1 1
C 0 1 1 1 1 1 1
D 0 1 1 2 2 2 2
G 0 1 1 2 2 2 2
H 0 1 1 2 2 3 3
最终 dp[6][6] = 3,即 LCS 长度为 3(对应子序列 "ADH")。
时间复杂度 O(mn),空间复杂度 O(mn)。
C++ 实现
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
int lcsLength(const std::string& X, const std::string& Y)
{
int m = X.length();
int n = Y.length();
// dp[i][j] = LCS length of X[0..i-1] and Y[0..j-1]
std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));
for (int i = 1; i <= m; i++)
{
for (int j = 1; j <= n; j++)
{
if (X[i - 1] == Y[j - 1])
{
dp[i][j] = dp[i - 1][j - 1] + 1;
}
else
{
dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
return dp[m][n];
}
int main()
{
std::string X = "ABCDGH";
std::string Y = "AEDFHR";
std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
std::cout << "LCS length = " << lcsLength(X, Y) << "\n";
X = "AGGTAB";
Y = "GXTXAYB";
std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
std::cout << "LCS length = " << lcsLength(X, Y) << "\n";
return 0;
}
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4
C 实现
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int max(int a, int b)
{
return (a > b) ? a : b;
}
int lcsLength(const char* X, const char* Y)
{
int m = strlen(X);
int n = strlen(Y);
// allocate dp table: (m+1) x (n+1)
int** dp = (int**)malloc((m + 1) * sizeof(int*));
for (int i = 0; i <= m; i++)
{
dp[i] = (int*)calloc(n + 1, sizeof(int));
}
for (int i = 1; i <= m; i++)
{
for (int j = 1; j <= n; j++)
{
if (X[i - 1] == Y[j - 1])
{
dp[i][j] = dp[i - 1][j - 1] + 1;
}
else
{
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
int result = dp[m][n];
// free memory
for (int i = 0; i <= m; i++)
{
free(dp[i]);
}
free(dp);
return result;
}
int main()
{
const char* X = "ABCDGH";
const char* Y = "AEDFHR";
printf("X = \"%s\", Y = \"%s\"\n", X, Y);
printf("LCS length = %d\n", lcsLength(X, Y));
X = "AGGTAB";
Y = "GXTXAYB";
printf("X = \"%s\", Y = \"%s\"\n", X, Y);
printf("LCS length = %d\n", lcsLength(X, Y));
return 0;
}
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4
Python 实现
def lcs_length(X, Y):
m, n = len(X), len(Y)
# dp[i][j] = LCS length of X[0..i-1] and Y[0..j-1]
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if X[i - 1] == Y[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
return dp[m][n]
X = "ABCDGH"
Y = "AEDFHR"
print(f'X = "{X}", Y = "{Y}"')
print(f"LCS length = {lcs_length(X, Y)}")
X = "AGGTAB"
Y = "GXTXAYB"
print(f'X = "{X}", Y = "{Y}"')
print(f"LCS length = {lcs_length(X, Y)}")
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4
Go 实现
package main
import "fmt"
func lcsLength(X, Y string) int {
m, n := len(X), len(Y)
// dp[i][j] = LCS length of X[0..i-1] and Y[0..j-1]
dp := make([][]int, m+1)
for i := range dp {
dp[i] = make([]int, n+1)
}
for i := 1; i <= m; i++ {
for j := 1; j <= n; j++ {
if X[i-1] == Y[j-1] {
dp[i][j] = dp[i-1][j-1] + 1
} else if dp[i-1][j] > dp[i][j-1] {
dp[i][j] = dp[i-1][j]
} else {
dp[i][j] = dp[i][j-1]
}
}
}
return dp[m][n]
}
func main() {
X := "ABCDGH"
Y := "AEDFHR"
fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
fmt.Printf("LCS length = %d\n", lcsLength(X, Y))
X = "AGGTAB"
Y = "GXTXAYB"
fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
fmt.Printf("LCS length = %d\n", lcsLength(X, Y))
}
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4
动态规划法使用二维数组 dp 表记录所有子问题的解。C 语言需要手动分配和释放内存(malloc/free),C++ 使用 vector 自动管理,Python 使用列表推导式,Go 使用 make 创建二维切片。四种实现的核心逻辑完全一致:双重循环填充 dp 表,最终 dp[m][n] 即为结果。
动态规划法 — 还原LCS序列
计算 LCS 长度后,通过回溯(Backtracking) dp 表可以还原出具体的 LCS 字符串。
回溯规则:从 dp[m][n] 开始,向左上角回溯。
- 如果
X[i-1] == Y[j-1]:该字符属于 LCS,加入结果,移动到dp[i-1][j-1] - 如果
dp[i-1][j] >= dp[i][j-1]:向上移动到dp[i-1][j] - 否则:向左移动到
dp[i][j-1]
以 X = "AGGTAB",Y = "GXTXAYB" 为例,dp 表及回溯路径(用箭头标记):
"" G X T X A Y B
"" 0 0 0 0 0 0 0 0
A 0 0 0 0 0 1 1 1
G 0 1← 1 1 1 1 1 1
G 0 1 1 1 1 1 1 1
T 0 1 1← 2← 2 2 2 2
A 0 1 1 2 2← 3← 3 3
B 0 1 1 2 2 3 3← 4
回溯路径(从 dp[6][7]=4 出发):
B ← A ← (对角线到T) ← T ← (对角线到G) ← G
收集到的字符(逆序): B, A, T, G → 翻转得 "GTAB"
C++ 实现
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
std::string lcsString(const std::string& X, const std::string& Y)
{
int m = X.length();
int n = Y.length();
std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));
// fill dp table
for (int i = 1; i <= m; i++)
{
for (int j = 1; j <= n; j++)
{
if (X[i - 1] == Y[j - 1])
{
dp[i][j] = dp[i - 1][j - 1] + 1;
}
else
{
dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
// backtrack to reconstruct LCS string
std::string result;
int i = m, j = n;
while (i > 0 && j > 0)
{
if (X[i - 1] == Y[j - 1])
{
result.push_back(X[i - 1]);
i--;
j--;
}
else if (dp[i - 1][j] > dp[i][j - 1])
{
i--;
}
else
{
j--;
}
}
std::reverse(result.begin(), result.end());
return result;
}
int main()
{
std::string X = "AGGTAB";
std::string Y = "GXTXAYB";
std::string lcs = lcsString(X, Y);
std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
std::cout << "LCS = \"" << lcs << "\", length = " << lcs.length() << "\n";
X = "ABCDGH";
Y = "AEDFHR";
lcs = lcsString(X, Y);
std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
std::cout << "LCS = \"" << lcs << "\", length = " << lcs.length() << "\n";
return 0;
}
运行该程序将输出
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
C 实现
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int max(int a, int b)
{
return (a > b) ? a : b;
}
char* lcsString(const char* X, const char* Y)
{
int m = strlen(X);
int n = strlen(Y);
// allocate dp table
int** dp = (int**)malloc((m + 1) * sizeof(int*));
for (int i = 0; i <= m; i++)
{
dp[i] = (int*)calloc(n + 1, sizeof(int));
}
// fill dp table
for (int i = 1; i <= m; i++)
{
for (int j = 1; j <= n; j++)
{
if (X[i - 1] == Y[j - 1])
{
dp[i][j] = dp[i - 1][j - 1] + 1;
}
else
{
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
// backtrack to reconstruct LCS string
int lcsLen = dp[m][n];
char* result = (char*)malloc((lcsLen + 1) * sizeof(char));
result[lcsLen] = '\0';
int i = m, j = n;
int idx = lcsLen - 1;
while (i > 0 && j > 0)
{
if (X[i - 1] == Y[j - 1])
{
result[idx--] = X[i - 1];
i--;
j--;
}
else if (dp[i - 1][j] > dp[i][j - 1])
{
i--;
}
else
{
j--;
}
}
// free dp table
for (int i = 0; i <= m; i++)
{
free(dp[i]);
}
free(dp);
return result;
}
int main()
{
const char* X = "AGGTAB";
const char* Y = "GXTXAYB";
char* lcs = lcsString(X, Y);
printf("X = \"%s\", Y = \"%s\"\n", X, Y);
printf("LCS = \"%s\", length = %d\n", lcs, (int)strlen(lcs));
free(lcs);
X = "ABCDGH";
Y = "AEDFHR";
lcs = lcsString(X, Y);
printf("X = \"%s\", Y = \"%s\"\n", X, Y);
printf("LCS = \"%s\", length = %d\n", lcs, (int)strlen(lcs));
free(lcs);
return 0;
}
运行该程序将输出
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
Python 实现
def lcs_string(X, Y):
m, n = len(X), len(Y)
# fill dp table
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if X[i - 1] == Y[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
# backtrack to reconstruct LCS string
result = []
i, j = m, n
while i > 0 and j > 0:
if X[i - 1] == Y[j - 1]:
result.append(X[i - 1])
i -= 1
j -= 1
elif dp[i - 1][j] > dp[i][j - 1]:
i -= 1
else:
j -= 1
return ''.join(reversed(result))
X = "AGGTAB"
Y = "GXTXAYB"
lcs = lcs_string(X, Y)
print(f'X = "{X}", Y = "{Y}"')
print(f'LCS = "{lcs}", length = {len(lcs)}')
X = "ABCDGH"
Y = "AEDFHR"
lcs = lcs_string(X, Y)
print(f'X = "{X}", Y = "{Y}"')
print(f'LCS = "{lcs}", length = {len(lcs)}')
运行该程序将输出
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
Go 实现
package main
import "fmt"
func lcsString(X, Y string) string {
m, n := len(X), len(Y)
// fill dp table
dp := make([][]int, m+1)
for i := range dp {
dp[i] = make([]int, n+1)
}
for i := 1; i <= m; i++ {
for j := 1; j <= n; j++ {
if X[i-1] == Y[j-1] {
dp[i][j] = dp[i-1][j-1] + 1
} else if dp[i-1][j] > dp[i][j-1] {
dp[i][j] = dp[i-1][j]
} else {
dp[i][j] = dp[i][j-1]
}
}
}
// backtrack to reconstruct LCS string
result := make([]byte, 0, dp[m][n])
i, j := m, n
for i > 0 && j > 0 {
if X[i-1] == Y[j-1] {
result = append(result, X[i-1])
i--
j--
} else if dp[i-1][j] > dp[i][j-1] {
i--
} else {
j--
}
}
// reverse result
for left, right := 0, len(result)-1; left < right; left, right = left+1, right-1 {
result[left], result[right] = result[right], result[left]
}
return string(result)
}
func main() {
X := "AGGTAB"
Y := "GXTXAYB"
lcs := lcsString(X, Y)
fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
fmt.Printf("LCS = \"%s\", length = %d\n", lcs, len(lcs))
X = "ABCDGH"
Y = "AEDFHR"
lcs = lcsString(X, Y)
fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
fmt.Printf("LCS = \"%s\", length = %d\n", lcs, len(lcs))
}
运行该程序将输出
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
回溯法的关键在于从 dp[m][n] 出发,沿着填充路径的"来路"逆行。当两个字符匹配时,该字符一定属于 LCS;不匹配时,向 dp 值更大的方向移动。由于回溯收集字符的顺序是从后往前的,最终需要翻转字符串得到正确结果。C 语言实现通过预分配精确大小的数组并从后往前填充,避免了翻转操作。
空间优化动态规划
观察 dp 状态转移方程可以发现,dp[i][j] 只依赖于 dp[i-1][j-1]、dp[i-1][j] 和 dp[i][j-1],即当前行只依赖上一行的值。因此可以只保留两行(前一行和当前行),将空间复杂度从 O(mn) 优化为 O(min(m,n))。
为了进一步优化,始终让较短的序列作为内层循环维度,这样空间占用最小。
C++ 实现
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
int lcsLengthOptimized(const std::string& X, const std::string& Y)
{
// ensure Y is the shorter string for less space usage
const std::string& S1 = (X.length() >= Y.length()) ? X : Y;
const std::string& S2 = (X.length() >= Y.length()) ? Y : X;
int m = S1.length();
int n = S2.length();
std::vector<int> prev(n + 1, 0);
std::vector<int> curr(n + 1, 0);
for (int i = 1; i <= m; i++)
{
for (int j = 1; j <= n; j++)
{
if (S1[i - 1] == S2[j - 1])
{
curr[j] = prev[j - 1] + 1;
}
else
{
curr[j] = std::max(prev[j], curr[j - 1]);
}
}
// swap prev and curr for next iteration
std::swap(prev, curr);
// reset curr to zeros
std::fill(curr.begin(), curr.end(), 0);
}
return prev[n];
}
int main()
{
std::string X = "ABCDGH";
std::string Y = "AEDFHR";
std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
std::cout << "LCS length = " << lcsLengthOptimized(X, Y) << "\n";
X = "AGGTAB";
Y = "GXTXAYB";
std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
std::cout << "LCS length = " << lcsLengthOptimized(X, Y) << "\n";
return 0;
}
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4
C 实现
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int max(int a, int b)
{
return (a > b) ? a : b;
}
int lcsLengthOptimized(const char* X, const char* Y)
{
int m = strlen(X);
int n = strlen(Y);
// ensure n is the smaller dimension
const char* S1 = X;
const char* S2 = Y;
int len1 = m, len2 = n;
if (m < n)
{
S1 = Y; S2 = X;
len1 = n; len2 = m;
}
int* prev = (int*)calloc(len2 + 1, sizeof(int));
int* curr = (int*)calloc(len2 + 1, sizeof(int));
for (int i = 1; i <= len1; i++)
{
for (int j = 1; j <= len2; j++)
{
if (S1[i - 1] == S2[j - 1])
{
curr[j] = prev[j - 1] + 1;
}
else
{
curr[j] = max(prev[j], curr[j - 1]);
}
}
// swap prev and curr
int* tmp = prev;
prev = curr;
curr = tmp;
memset(curr, 0, (len2 + 1) * sizeof(int));
}
int result = prev[len2];
free(prev);
free(curr);
return result;
}
int main()
{
const char* X = "ABCDGH";
const char* Y = "AEDFHR";
printf("X = \"%s\", Y = \"%s\"\n", X, Y);
printf("LCS length = %d\n", lcsLengthOptimized(X, Y));
X = "AGGTAB";
Y = "GXTXAYB";
printf("X = \"%s\", Y = \"%s\"\n", X, Y);
printf("LCS length = %d\n", lcsLengthOptimized(X, Y));
return 0;
}
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4
Python 实现
def lcs_length_optimized(X, Y):
# ensure Y is the shorter string for less space usage
if len(X) < len(Y):
X, Y = Y, X
m, n = len(X), len(Y)
prev = [0] * (n + 1)
curr = [0] * (n + 1)
for i in range(1, m + 1):
for j in range(1, n + 1):
if X[i - 1] == Y[j - 1]:
curr[j] = prev[j - 1] + 1
else:
curr[j] = max(prev[j], curr[j - 1])
prev, curr = curr, [0] * (n + 1)
return prev[n]
X = "ABCDGH"
Y = "AEDFHR"
print(f'X = "{X}", Y = "{Y}"')
print(f"LCS length = {lcs_length_optimized(X, Y)}")
X = "AGGTAB"
Y = "GXTXAYB"
print(f'X = "{X}", Y = "{Y}"')
print(f"LCS length = {lcs_length_optimized(X, Y)}")
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4
Go 实现
package main
import "fmt"
func lcsLengthOptimized(X, Y string) int {
// ensure S2 is the shorter string for less space usage
S1, S2 := X, Y
if len(X) < len(Y) {
S1, S2 = Y, X
}
m, n := len(S1), len(S2)
prev := make([]int, n+1)
curr := make([]int, n+1)
for i := 1; i <= m; i++ {
for j := 1; j <= n; j++ {
if S1[i-1] == S2[j-1] {
curr[j] = prev[j-1] + 1
} else if prev[j] > curr[j-1] {
curr[j] = prev[j]
} else {
curr[j] = curr[j-1]
}
}
prev, curr = curr, prev
for j := range curr {
curr[j] = 0
}
}
return prev[n]
}
func main() {
X := "ABCDGH"
Y := "AEDFHR"
fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
fmt.Printf("LCS length = %d\n", lcsLengthOptimized(X, Y))
X = "AGGTAB"
Y = "GXTXAYB"
fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
fmt.Printf("LCS length = %d\n", lcsLengthOptimized(X, Y))
}
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4
空间优化的核心思想是用两个一维数组 prev(前一行)和 curr(当前行)替代完整的二维 dp 表。每完成一行的计算后,交换两个数组并重置 curr。由于只计算长度、不还原序列,O(min(m,n)) 的空间足够。注意当只需要计算长度时此方法适用,若需要还原 LCS 字符串则仍需完整 dp 表。
完整实现
下面给出包含 LCS 长度计算和序列还原的完整程序,并使用多个测试用例验证。
C++ 实现
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
// compute both LCS length and reconstruct the LCS string
std::pair<int, std::string> lcs(const std::string& X, const std::string& Y)
{
int m = X.length();
int n = Y.length();
std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));
// fill dp table
for (int i = 1; i <= m; i++)
{
for (int j = 1; j <= n; j++)
{
if (X[i - 1] == Y[j - 1])
{
dp[i][j] = dp[i - 1][j - 1] + 1;
}
else
{
dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
// backtrack to reconstruct LCS
std::string lcsStr;
int i = m, j = n;
while (i > 0 && j > 0)
{
if (X[i - 1] == Y[j - 1])
{
lcsStr.push_back(X[i - 1]);
i--;
j--;
}
else if (dp[i - 1][j] > dp[i][j - 1])
{
i--;
}
else
{
j--;
}
}
std::reverse(lcsStr.begin(), lcsStr.end());
return {dp[m][n], lcsStr};
}
int main()
{
std::pair<std::string, std::string> tests[] = {
{"ABCDGH", "AEDFHR"},
{"AGGTAB", "GXTXAYB"},
{"ABCBDAB", "BDCABA"},
{"", "ABC"},
{"ABC", "ABC"}
};
for (auto& [X, Y] : tests)
{
auto [length, lcsStr] = lcs(X, Y);
std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
std::cout << "LCS = \"" << lcsStr << "\", length = " << length << "\n";
std::cout << "---\n";
}
return 0;
}
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
---
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
---
X = "ABCBDAB", Y = "BDCABA"
LCS = "BCBA", length = 4
---
X = "", Y = "ABC"
LCS = "", length = 0
---
X = "ABC", Y = "ABC"
LCS = "ABC", length = 3
---
C 实现
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int max(int a, int b)
{
return (a > b) ? a : b;
}
typedef struct {
int length;
char* str;
} LcsResult;
LcsResult lcs(const char* X, const char* Y)
{
int m = strlen(X);
int n = strlen(Y);
LcsResult res;
// allocate dp table
int** dp = (int**)malloc((m + 1) * sizeof(int*));
for (int i = 0; i <= m; i++)
{
dp[i] = (int*)calloc(n + 1, sizeof(int));
}
// fill dp table
for (int i = 1; i <= m; i++)
{
for (int j = 1; j <= n; j++)
{
if (X[i - 1] == Y[j - 1])
{
dp[i][j] = dp[i - 1][j - 1] + 1;
}
else
{
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
// backtrack to reconstruct LCS
int lcsLen = dp[m][n];
res.str = (char*)malloc((lcsLen + 1) * sizeof(char));
res.str[lcsLen] = '\0';
res.length = lcsLen;
int i = m, j = n, idx = lcsLen - 1;
while (i > 0 && j > 0)
{
if (X[i - 1] == Y[j - 1])
{
res.str[idx--] = X[i - 1];
i--;
j--;
}
else if (dp[i - 1][j] > dp[i][j - 1])
{
i--;
}
else
{
j--;
}
}
// free dp table
for (int i = 0; i <= m; i++)
{
free(dp[i]);
}
free(dp);
return res;
}
int main()
{
const char* testX[] = {"ABCDGH", "AGGTAB", "ABCBDAB", "", "ABC"};
const char* testY[] = {"AEDFHR", "GXTXAYB", "BDCABA", "ABC", "ABC"};
int numTests = 5;
for (int t = 0; t < numTests; t++)
{
LcsResult res = lcs(testX[t], testY[t]);
printf("X = \"%s\", Y = \"%s\"\n", testX[t], testY[t]);
printf("LCS = \"%s\", length = %d\n", res.str, res.length);
printf("---\n");
free(res.str);
}
return 0;
}
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
---
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
---
X = "ABCBDAB", Y = "BDCABA"
LCS = "BCBA", length = 4
---
X = "", Y = "ABC"
LCS = "", length = 0
---
X = "ABC", Y = "ABC"
LCS = "ABC", length = 3
---
Python 实现
def lcs(X, Y):
m, n = len(X), len(Y)
# fill dp table
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if X[i - 1] == Y[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
# backtrack to reconstruct LCS
result = []
i, j = m, n
while i > 0 and j > 0:
if X[i - 1] == Y[j - 1]:
result.append(X[i - 1])
i -= 1
j -= 1
elif dp[i - 1][j] > dp[i][j - 1]:
i -= 1
else:
j -= 1
lcs_str = ''.join(reversed(result))
return dp[m][n], lcs_str
tests = [
("ABCDGH", "AEDFHR"),
("AGGTAB", "GXTXAYB"),
("ABCBDAB", "BDCABA"),
("", "ABC"),
("ABC", "ABC"),
]
for X, Y in tests:
length, lcs_str = lcs(X, Y)
print(f'X = "{X}", Y = "{Y}"')
print(f'LCS = "{lcs_str}", length = {length}')
print("---")
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
---
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
---
X = "ABCBDAB", Y = "BDCABA"
LCS = "BCBA", length = 4
---
X = "", Y = "ABC"
LCS = "", length = 0
---
X = "ABC", Y = "ABC"
LCS = "ABC", length = 3
---
Go 实现
package main
import "fmt"
func lcs(X, Y string) (int, string) {
m, n := len(X), len(Y)
// fill dp table
dp := make([][]int, m+1)
for i := range dp {
dp[i] = make([]int, n+1)
}
for i := 1; i <= m; i++ {
for j := 1; j <= n; j++ {
if X[i-1] == Y[j-1] {
dp[i][j] = dp[i-1][j-1] + 1
} else if dp[i-1][j] > dp[i][j-1] {
dp[i][j] = dp[i-1][j]
} else {
dp[i][j] = dp[i][j-1]
}
}
}
// backtrack to reconstruct LCS
result := make([]byte, 0, dp[m][n])
i, j := m, n
for i > 0 && j > 0 {
if X[i-1] == Y[j-1] {
result = append(result, X[i-1])
i--
j--
} else if dp[i-1][j] > dp[i][j-1] {
i--
} else {
j--
}
}
// reverse result
for left, right := 0, len(result)-1; left < right; left, right = left+1, right-1 {
result[left], result[right] = result[right], result[left]
}
return dp[m][n], string(result)
}
func main() {
tests := []struct{ X, Y string }{
{"ABCDGH", "AEDFHR"},
{"AGGTAB", "GXTXAYB"},
{"ABCBDAB", "BDCABA"},
{"", "ABC"},
{"ABC", "ABC"},
}
for _, t := range tests {
length, lcsStr := lcs(t.X, t.Y)
fmt.Printf("X = \"%s\", Y = \"%s\"\n", t.X, t.Y)
fmt.Printf("LCS = \"%s\", length = %d\n", lcsStr, length)
fmt.Println("---")
}
}
运行该程序将输出
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
---
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
---
X = "ABCBDAB", Y = "BDCABA"
LCS = "BCBA", length = 4
---
X = "", Y = "ABC"
LCS = "", length = 0
---
X = "ABC", Y = "ABC"
LCS = "ABC", length = 3
---
完整实现包含了五个测试用例,覆盖了典型情况("ABCDGH"/"AEDFHR"、"AGGTAB"/"GXTXAYB")、多个解情况("ABCBDAB"/"BDCABA" 的 LCS 长度为 4,可能有 "BCBA"、"BDAB" 等多个解)、空串边界情况、以及两个字符串完全相同的情况。C 语言使用结构体 LcsResult 同时返回长度和字符串;C++ 使用 std::pair;Python 使用元组;Go 使用多返回值。
LCS的性质
复杂度对比
| 方法 | 时间复杂度 | 空间复杂度 | 能否还原序列 |
|---|---|---|---|
| 递归法(暴力搜索) | O(2^(m+n)) | O(m+n)(递归栈) | 否 |
| 动态规划(计算长度) | O(mn) | O(mn) | 否 |
| 动态规划(还原序列) | O(mn) | O(mn) | 是 |
| 空间优化动态规划 | O(mn) | O(min(m,n)) | 否 |
相关问题
LCS 是字符串算法的基础问题,许多相关问题可以基于 LCS 的思路求解:
| 相关问题 | 说明 |
|---|---|
| 最长公共子串(Longest Common Substring) | 要求公共部分在原串中连续出现。dp 转移时,若字符不匹配则 dp[i][j] = 0(而非取 max),最终取 dp 表中的最大值 |
| 最短公共超序列(Shortest Common Supersequence, SCS) | 找到包含两个序列的最短序列。SCS 长度 = m + n - LCS 长度 |
| 编辑距离(Edit Distance) | 将一个字符串转换为另一个所需的最少操作数。与 LCS 有关联但转移方程不同,需要考虑插入、删除、替换三种操作 |
| 最长递增子序列(Longest Increasing Subsequence, LIS) | 可以转化为 LCS 问题:将序列排序后与原序列求 LCS |
| 最长回文子序列(Longest Palindromic Subsequence) | 将字符串与其反转串求 LCS |
实际应用
LCS 在实际中有广泛的应用场景:
- 版本控制工具(如 git diff):通过 LCS 比较文件不同版本的差异,找出新增、删除、修改的行
- 生物信息学(Bioinformatics):DNA 序列和蛋白质序列的比对(Sequence Alignment),通过 LCS 找出不同物种基因序列的相似性
- 查重与剽窃检测(Plagiarism Detection):将文档分解为句子或段落序列后,通过 LCS 比较相似度
- 拼写校正(Spell Correction):通过 LCS 计算用户输入与词典中单词的相似度,推荐最可能的正确拼写

浙公网安备 33010602011771号