8-3 最长公共子序列

最长公共子序列(Longest Common Subsequence, LCS)

最长公共子序列(Longest Common Subsequence, LCS)是计算机科学中的经典问题:给定两个序列,找出它们共有的最长子序列。

子序列(Subsequence) 是指从原序列中删除零个或多个元素后,剩余元素保持原有相对顺序所构成的序列。子序列不要求元素在原序列中连续,但要求相对顺序不变。

子串(Substring) 则要求元素在原序列中必须连续出现。

举例区分:

字符串 X = "ABCDE"

子序列: "ACE"  -- 删除 B、D,保持 A→C→E 的相对顺序
子序列: "ACD"  -- 删除 B、E
子串:   "ABC"  -- 连续的
子串:   "CDE"  -- 连续的
"ACE" 不是子串,因为 A、C、E 在原串中不连续

LCS 的经典示例:

  • X = "ABCDGH",Y = "AEDFHR",LCS = "ADH",长度 = 3
  • X = "AGGTAB",Y = "GXTXAYB",LCS = "GTAB",长度 = 4

LCS 是典型的动态规划(Dynamic Programming, DP) 问题。下面从递归暴力搜索出发,逐步推导出高效的动态规划解法。


递归法(暴力搜索)

递归法基于 LCS 的最优子结构性质:

  • 如果两个序列的最后一个字符相同:那么该字符一定属于 LCS 的一部分。问题转化为求去掉最后一个字符后两个子序列的 LCS,再加 1。
  • 如果最后一个字符不同:那么 LCS 取以下两种情况的较大值——去掉 X 的最后一个字符后求 LCS,或去掉 Y 的最后一个字符后求 LCS。

递推公式:

LCS(X[0..m-1], Y[0..n-1]):
  if m == 0 or n == 0:
    return 0
  if X[m-1] == Y[n-1]:
    return 1 + LCS(X[0..m-2], Y[0..n-2])
  else:
    return max(LCS(X[0..m-2], Y[0..n-1]), LCS(X[0..m-1], Y[0..n-2]))

以 "ABCDGH" 和 "AEDFHR" 为例,递归树的部分展开:

LCS("ABCDGH", "AEDFHR")    -- H != R
├── LCS("ABCDG", "AEDFHR") -- G != R
│   ├── LCS("ABCD", "AEDFHR") -- D == D!
│   │   └── 1 + LCS("ABC", "AEDFHR") → ...
│   └── LCS("ABCDG", "AEDFH") -- ...
└── LCS("ABCDGH", "AEDFH") -- H == H!
    └── 1 + LCS("ABCDG", "AEDF") → ...

递归法的时间复杂度为 O(2^(m+n)),指数级别,存在大量重复子问题,效率极低。

C++ 实现

#include <iostream>
#include <string>
#include <algorithm>

int lcs(const std::string& X, const std::string& Y, int m, int n)
{
    // base case: either string is empty
    if (m == 0 || n == 0)
    {
        return 0;
    }
    // last chars match
    if (X[m - 1] == Y[n - 1])
    {
        return 1 + lcs(X, Y, m - 1, n - 1);
    }
    // last chars differ: take max of two sub-problems
    return std::max(lcs(X, Y, m - 1, n), lcs(X, Y, m, n - 1));
}

int main()
{
    std::string X = "ABCDGH";
    std::string Y = "AEDFHR";
    int result = lcs(X, Y, X.length(), Y.length());
    std::cout << "X = \"" << X << "\"\n";
    std::cout << "Y = \"" << Y << "\"\n";
    std::cout << "LCS length = " << result << "\n";

    X = "AGGTAB";
    Y = "GXTXAYB";
    result = lcs(X, Y, X.length(), Y.length());
    std::cout << "X = \"" << X << "\"\n";
    std::cout << "Y = \"" << Y << "\"\n";
    std::cout << "LCS length = " << result << "\n";
    return 0;
}

运行该程序将输出

X = "ABCDGH"
Y = "AEDFHR"
LCS length = 3
X = "AGGTAB"
Y = "GXTXAYB"
LCS length = 4

Python 实现

def lcs(X, Y, m, n):
    # base case: either string is empty
    if m == 0 or n == 0:
        return 0
    # last chars match
    if X[m - 1] == Y[n - 1]:
        return 1 + lcs(X, Y, m - 1, n - 1)
    # last chars differ: take max of two sub-problems
    return max(lcs(X, Y, m - 1, n), lcs(X, Y, m, n - 1))

X = "ABCDGH"
Y = "AEDFHR"
print(f'X = "{X}"')
print(f'Y = "{Y}"')
print(f"LCS length = {lcs(X, Y, len(X), len(Y))}")

X = "AGGTAB"
Y = "GXTXAYB"
print(f'X = "{X}"')
print(f'Y = "{Y}"')
print(f"LCS length = {lcs(X, Y, len(X), len(Y))}")

运行该程序将输出

X = "ABCDGH"
Y = "AEDFHR"
LCS length = 3
X = "AGGTAB"
Y = "GXTXAYB"
LCS length = 4

Go 实现

package main

import (
	"fmt"
)

func lcs(X, Y string, m, n int) int {
	// base case: either string is empty
	if m == 0 || n == 0 {
		return 0
	}
	// last chars match
	if X[m-1] == Y[n-1] {
		return 1 + lcs(X, Y, m-1, n-1)
	}
	// last chars differ: take max of two sub-problems
	left := lcs(X, Y, m-1, n)
	right := lcs(X, Y, m, n-1)
	if left > right {
		return left
	}
	return right
}

func main() {
	X := "ABCDGH"
	Y := "AEDFHR"
	fmt.Printf("X = \"%s\"\n", X)
	fmt.Printf("Y = \"%s\"\n", Y)
	fmt.Printf("LCS length = %d\n", lcs(X, Y, len(X), len(Y)))

	X = "AGGTAB"
	Y = "GXTXAYB"
	fmt.Printf("X = \"%s\"\n", X)
	fmt.Printf("Y = \"%s\"\n", Y)
	fmt.Printf("LCS length = %d\n", lcs(X, Y, len(X), len(Y)))
}

运行该程序将输出

X = "ABCDGH"
Y = "AEDFHR"
LCS length = 3
X = "AGGTAB"
Y = "GXTXAYB"
LCS length = 4

递归法虽然直观易懂,但时间复杂度为 O(2^(m+n)),存在大量重复计算。对于较长的字符串,运行时间会呈指数增长。Go 标准库没有内置 max 函数用于整数(Go 1.21 之前),因此需要手动比较取较大值。


动态规划法 — 计算LCS长度

动态规划(Dynamic Programming, DP)通过用表格存储子问题的解,避免重复计算,将指数级复杂度降为多项式级。

定义 dp[i][j] 表示序列 X[0..i-1]Y[0..j-1] 的 LCS 长度。状态转移方程为:

if i == 0 or j == 0:
    dp[i][j] = 0
elif X[i-1] == Y[j-1]:
    dp[i][j] = dp[i-1][j-1] + 1
else:
    dp[i][j] = max(dp[i-1][j], dp[i][j-1])

以 X = "ABCDGH",Y = "AEDFHR" 为例,逐步填充 dp 表(m=6, n=6):

        ""  A  E  D  F  H  R
    ""   0  0  0  0  0  0  0
     A   0  1  1  1  1  1  1
     B   0  1  1  1  1  1  1
     C   0  1  1  1  1  1  1
     D   0  1  1  2  2  2  2
     G   0  1  1  2  2  2  2
     H   0  1  1  2  2  3  3

最终 dp[6][6] = 3,即 LCS 长度为 3(对应子序列 "ADH")。

时间复杂度 O(mn),空间复杂度 O(mn)。

C++ 实现

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

int lcsLength(const std::string& X, const std::string& Y)
{
    int m = X.length();
    int n = Y.length();
    // dp[i][j] = LCS length of X[0..i-1] and Y[0..j-1]
    std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));

    for (int i = 1; i <= m; i++)
    {
        for (int j = 1; j <= n; j++)
        {
            if (X[i - 1] == Y[j - 1])
            {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            }
            else
            {
                dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
            }
        }
    }
    return dp[m][n];
}

int main()
{
    std::string X = "ABCDGH";
    std::string Y = "AEDFHR";
    std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
    std::cout << "LCS length = " << lcsLength(X, Y) << "\n";

    X = "AGGTAB";
    Y = "GXTXAYB";
    std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
    std::cout << "LCS length = " << lcsLength(X, Y) << "\n";
    return 0;
}

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4

C 实现

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int max(int a, int b)
{
    return (a > b) ? a : b;
}

int lcsLength(const char* X, const char* Y)
{
    int m = strlen(X);
    int n = strlen(Y);

    // allocate dp table: (m+1) x (n+1)
    int** dp = (int**)malloc((m + 1) * sizeof(int*));
    for (int i = 0; i <= m; i++)
    {
        dp[i] = (int*)calloc(n + 1, sizeof(int));
    }

    for (int i = 1; i <= m; i++)
    {
        for (int j = 1; j <= n; j++)
        {
            if (X[i - 1] == Y[j - 1])
            {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            }
            else
            {
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
            }
        }
    }

    int result = dp[m][n];

    // free memory
    for (int i = 0; i <= m; i++)
    {
        free(dp[i]);
    }
    free(dp);

    return result;
}

int main()
{
    const char* X = "ABCDGH";
    const char* Y = "AEDFHR";
    printf("X = \"%s\", Y = \"%s\"\n", X, Y);
    printf("LCS length = %d\n", lcsLength(X, Y));

    X = "AGGTAB";
    Y = "GXTXAYB";
    printf("X = \"%s\", Y = \"%s\"\n", X, Y);
    printf("LCS length = %d\n", lcsLength(X, Y));
    return 0;
}

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4

Python 实现

def lcs_length(X, Y):
    m, n = len(X), len(Y)
    # dp[i][j] = LCS length of X[0..i-1] and Y[0..j-1]
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    return dp[m][n]

X = "ABCDGH"
Y = "AEDFHR"
print(f'X = "{X}", Y = "{Y}"')
print(f"LCS length = {lcs_length(X, Y)}")

X = "AGGTAB"
Y = "GXTXAYB"
print(f'X = "{X}", Y = "{Y}"')
print(f"LCS length = {lcs_length(X, Y)}")

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4

Go 实现

package main

import "fmt"

func lcsLength(X, Y string) int {
	m, n := len(X), len(Y)
	// dp[i][j] = LCS length of X[0..i-1] and Y[0..j-1]
	dp := make([][]int, m+1)
	for i := range dp {
		dp[i] = make([]int, n+1)
	}

	for i := 1; i <= m; i++ {
		for j := 1; j <= n; j++ {
			if X[i-1] == Y[j-1] {
				dp[i][j] = dp[i-1][j-1] + 1
			} else if dp[i-1][j] > dp[i][j-1] {
				dp[i][j] = dp[i-1][j]
			} else {
				dp[i][j] = dp[i][j-1]
			}
		}
	}
	return dp[m][n]
}

func main() {
	X := "ABCDGH"
	Y := "AEDFHR"
	fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
	fmt.Printf("LCS length = %d\n", lcsLength(X, Y))

	X = "AGGTAB"
	Y = "GXTXAYB"
	fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
	fmt.Printf("LCS length = %d\n", lcsLength(X, Y))
}

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4

动态规划法使用二维数组 dp 表记录所有子问题的解。C 语言需要手动分配和释放内存(malloc/free),C++ 使用 vector 自动管理,Python 使用列表推导式,Go 使用 make 创建二维切片。四种实现的核心逻辑完全一致:双重循环填充 dp 表,最终 dp[m][n] 即为结果。


动态规划法 — 还原LCS序列

计算 LCS 长度后,通过回溯(Backtracking) dp 表可以还原出具体的 LCS 字符串。

回溯规则:从 dp[m][n] 开始,向左上角回溯。

  • 如果 X[i-1] == Y[j-1]:该字符属于 LCS,加入结果,移动到 dp[i-1][j-1]
  • 如果 dp[i-1][j] >= dp[i][j-1]:向上移动到 dp[i-1][j]
  • 否则:向左移动到 dp[i][j-1]

以 X = "AGGTAB",Y = "GXTXAYB" 为例,dp 表及回溯路径(用箭头标记):

        ""  G  X  T  X  A  Y  B
    ""   0  0  0  0  0  0  0  0
     A   0  0  0  0  0  1  1  1
     G   0  1← 1  1  1  1  1  1
     G   0  1  1  1  1  1  1  1
     T   0  1  1← 2← 2  2  2  2
     A   0  1  1  2  2← 3← 3  3
     B   0  1  1  2  2  3  3← 4

回溯路径(从 dp[6][7]=4 出发):
B ← A ← (对角线到T) ← T ← (对角线到G) ← G
收集到的字符(逆序): B, A, T, G → 翻转得 "GTAB"

C++ 实现

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

std::string lcsString(const std::string& X, const std::string& Y)
{
    int m = X.length();
    int n = Y.length();
    std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));

    // fill dp table
    for (int i = 1; i <= m; i++)
    {
        for (int j = 1; j <= n; j++)
        {
            if (X[i - 1] == Y[j - 1])
            {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            }
            else
            {
                dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
            }
        }
    }

    // backtrack to reconstruct LCS string
    std::string result;
    int i = m, j = n;
    while (i > 0 && j > 0)
    {
        if (X[i - 1] == Y[j - 1])
        {
            result.push_back(X[i - 1]);
            i--;
            j--;
        }
        else if (dp[i - 1][j] > dp[i][j - 1])
        {
            i--;
        }
        else
        {
            j--;
        }
    }
    std::reverse(result.begin(), result.end());
    return result;
}

int main()
{
    std::string X = "AGGTAB";
    std::string Y = "GXTXAYB";
    std::string lcs = lcsString(X, Y);
    std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
    std::cout << "LCS = \"" << lcs << "\", length = " << lcs.length() << "\n";

    X = "ABCDGH";
    Y = "AEDFHR";
    lcs = lcsString(X, Y);
    std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
    std::cout << "LCS = \"" << lcs << "\", length = " << lcs.length() << "\n";
    return 0;
}

运行该程序将输出

X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3

C 实现

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int max(int a, int b)
{
    return (a > b) ? a : b;
}

char* lcsString(const char* X, const char* Y)
{
    int m = strlen(X);
    int n = strlen(Y);

    // allocate dp table
    int** dp = (int**)malloc((m + 1) * sizeof(int*));
    for (int i = 0; i <= m; i++)
    {
        dp[i] = (int*)calloc(n + 1, sizeof(int));
    }

    // fill dp table
    for (int i = 1; i <= m; i++)
    {
        for (int j = 1; j <= n; j++)
        {
            if (X[i - 1] == Y[j - 1])
            {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            }
            else
            {
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
            }
        }
    }

    // backtrack to reconstruct LCS string
    int lcsLen = dp[m][n];
    char* result = (char*)malloc((lcsLen + 1) * sizeof(char));
    result[lcsLen] = '\0';

    int i = m, j = n;
    int idx = lcsLen - 1;
    while (i > 0 && j > 0)
    {
        if (X[i - 1] == Y[j - 1])
        {
            result[idx--] = X[i - 1];
            i--;
            j--;
        }
        else if (dp[i - 1][j] > dp[i][j - 1])
        {
            i--;
        }
        else
        {
            j--;
        }
    }

    // free dp table
    for (int i = 0; i <= m; i++)
    {
        free(dp[i]);
    }
    free(dp);

    return result;
}

int main()
{
    const char* X = "AGGTAB";
    const char* Y = "GXTXAYB";
    char* lcs = lcsString(X, Y);
    printf("X = \"%s\", Y = \"%s\"\n", X, Y);
    printf("LCS = \"%s\", length = %d\n", lcs, (int)strlen(lcs));
    free(lcs);

    X = "ABCDGH";
    Y = "AEDFHR";
    lcs = lcsString(X, Y);
    printf("X = \"%s\", Y = \"%s\"\n", X, Y);
    printf("LCS = \"%s\", length = %d\n", lcs, (int)strlen(lcs));
    free(lcs);
    return 0;
}

运行该程序将输出

X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3

Python 实现

def lcs_string(X, Y):
    m, n = len(X), len(Y)
    # fill dp table
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    # backtrack to reconstruct LCS string
    result = []
    i, j = m, n
    while i > 0 and j > 0:
        if X[i - 1] == Y[j - 1]:
            result.append(X[i - 1])
            i -= 1
            j -= 1
        elif dp[i - 1][j] > dp[i][j - 1]:
            i -= 1
        else:
            j -= 1

    return ''.join(reversed(result))

X = "AGGTAB"
Y = "GXTXAYB"
lcs = lcs_string(X, Y)
print(f'X = "{X}", Y = "{Y}"')
print(f'LCS = "{lcs}", length = {len(lcs)}')

X = "ABCDGH"
Y = "AEDFHR"
lcs = lcs_string(X, Y)
print(f'X = "{X}", Y = "{Y}"')
print(f'LCS = "{lcs}", length = {len(lcs)}')

运行该程序将输出

X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3

Go 实现

package main

import "fmt"

func lcsString(X, Y string) string {
	m, n := len(X), len(Y)
	// fill dp table
	dp := make([][]int, m+1)
	for i := range dp {
		dp[i] = make([]int, n+1)
	}
	for i := 1; i <= m; i++ {
		for j := 1; j <= n; j++ {
			if X[i-1] == Y[j-1] {
				dp[i][j] = dp[i-1][j-1] + 1
			} else if dp[i-1][j] > dp[i][j-1] {
				dp[i][j] = dp[i-1][j]
			} else {
				dp[i][j] = dp[i][j-1]
			}
		}
	}

	// backtrack to reconstruct LCS string
	result := make([]byte, 0, dp[m][n])
	i, j := m, n
	for i > 0 && j > 0 {
		if X[i-1] == Y[j-1] {
			result = append(result, X[i-1])
			i--
			j--
		} else if dp[i-1][j] > dp[i][j-1] {
			i--
		} else {
			j--
		}
	}

	// reverse result
	for left, right := 0, len(result)-1; left < right; left, right = left+1, right-1 {
		result[left], result[right] = result[right], result[left]
	}
	return string(result)
}

func main() {
	X := "AGGTAB"
	Y := "GXTXAYB"
	lcs := lcsString(X, Y)
	fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
	fmt.Printf("LCS = \"%s\", length = %d\n", lcs, len(lcs))

	X = "ABCDGH"
	Y = "AEDFHR"
	lcs = lcsString(X, Y)
	fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
	fmt.Printf("LCS = \"%s\", length = %d\n", lcs, len(lcs))
}

运行该程序将输出

X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3

回溯法的关键在于从 dp[m][n] 出发,沿着填充路径的"来路"逆行。当两个字符匹配时,该字符一定属于 LCS;不匹配时,向 dp 值更大的方向移动。由于回溯收集字符的顺序是从后往前的,最终需要翻转字符串得到正确结果。C 语言实现通过预分配精确大小的数组并从后往前填充,避免了翻转操作。


空间优化动态规划

观察 dp 状态转移方程可以发现,dp[i][j] 只依赖于 dp[i-1][j-1]dp[i-1][j]dp[i][j-1],即当前行只依赖上一行的值。因此可以只保留两行(前一行和当前行),将空间复杂度从 O(mn) 优化为 O(min(m,n))。

为了进一步优化,始终让较短的序列作为内层循环维度,这样空间占用最小。

C++ 实现

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

int lcsLengthOptimized(const std::string& X, const std::string& Y)
{
    // ensure Y is the shorter string for less space usage
    const std::string& S1 = (X.length() >= Y.length()) ? X : Y;
    const std::string& S2 = (X.length() >= Y.length()) ? Y : X;
    int m = S1.length();
    int n = S2.length();

    std::vector<int> prev(n + 1, 0);
    std::vector<int> curr(n + 1, 0);

    for (int i = 1; i <= m; i++)
    {
        for (int j = 1; j <= n; j++)
        {
            if (S1[i - 1] == S2[j - 1])
            {
                curr[j] = prev[j - 1] + 1;
            }
            else
            {
                curr[j] = std::max(prev[j], curr[j - 1]);
            }
        }
        // swap prev and curr for next iteration
        std::swap(prev, curr);
        // reset curr to zeros
        std::fill(curr.begin(), curr.end(), 0);
    }
    return prev[n];
}

int main()
{
    std::string X = "ABCDGH";
    std::string Y = "AEDFHR";
    std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
    std::cout << "LCS length = " << lcsLengthOptimized(X, Y) << "\n";

    X = "AGGTAB";
    Y = "GXTXAYB";
    std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
    std::cout << "LCS length = " << lcsLengthOptimized(X, Y) << "\n";
    return 0;
}

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4

C 实现

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int max(int a, int b)
{
    return (a > b) ? a : b;
}

int lcsLengthOptimized(const char* X, const char* Y)
{
    int m = strlen(X);
    int n = strlen(Y);

    // ensure n is the smaller dimension
    const char* S1 = X;
    const char* S2 = Y;
    int len1 = m, len2 = n;
    if (m < n)
    {
        S1 = Y; S2 = X;
        len1 = n; len2 = m;
    }

    int* prev = (int*)calloc(len2 + 1, sizeof(int));
    int* curr = (int*)calloc(len2 + 1, sizeof(int));

    for (int i = 1; i <= len1; i++)
    {
        for (int j = 1; j <= len2; j++)
        {
            if (S1[i - 1] == S2[j - 1])
            {
                curr[j] = prev[j - 1] + 1;
            }
            else
            {
                curr[j] = max(prev[j], curr[j - 1]);
            }
        }
        // swap prev and curr
        int* tmp = prev;
        prev = curr;
        curr = tmp;
        memset(curr, 0, (len2 + 1) * sizeof(int));
    }

    int result = prev[len2];
    free(prev);
    free(curr);
    return result;
}

int main()
{
    const char* X = "ABCDGH";
    const char* Y = "AEDFHR";
    printf("X = \"%s\", Y = \"%s\"\n", X, Y);
    printf("LCS length = %d\n", lcsLengthOptimized(X, Y));

    X = "AGGTAB";
    Y = "GXTXAYB";
    printf("X = \"%s\", Y = \"%s\"\n", X, Y);
    printf("LCS length = %d\n", lcsLengthOptimized(X, Y));
    return 0;
}

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4

Python 实现

def lcs_length_optimized(X, Y):
    # ensure Y is the shorter string for less space usage
    if len(X) < len(Y):
        X, Y = Y, X
    m, n = len(X), len(Y)

    prev = [0] * (n + 1)
    curr = [0] * (n + 1)

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                curr[j] = prev[j - 1] + 1
            else:
                curr[j] = max(prev[j], curr[j - 1])
        prev, curr = curr, [0] * (n + 1)

    return prev[n]

X = "ABCDGH"
Y = "AEDFHR"
print(f'X = "{X}", Y = "{Y}"')
print(f"LCS length = {lcs_length_optimized(X, Y)}")

X = "AGGTAB"
Y = "GXTXAYB"
print(f'X = "{X}", Y = "{Y}"')
print(f"LCS length = {lcs_length_optimized(X, Y)}")

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4

Go 实现

package main

import "fmt"

func lcsLengthOptimized(X, Y string) int {
	// ensure S2 is the shorter string for less space usage
	S1, S2 := X, Y
	if len(X) < len(Y) {
		S1, S2 = Y, X
	}
	m, n := len(S1), len(S2)

	prev := make([]int, n+1)
	curr := make([]int, n+1)

	for i := 1; i <= m; i++ {
		for j := 1; j <= n; j++ {
			if S1[i-1] == S2[j-1] {
				curr[j] = prev[j-1] + 1
			} else if prev[j] > curr[j-1] {
				curr[j] = prev[j]
			} else {
				curr[j] = curr[j-1]
			}
		}
		prev, curr = curr, prev
		for j := range curr {
			curr[j] = 0
		}
	}
	return prev[n]
}

func main() {
	X := "ABCDGH"
	Y := "AEDFHR"
	fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
	fmt.Printf("LCS length = %d\n", lcsLengthOptimized(X, Y))

	X = "AGGTAB"
	Y = "GXTXAYB"
	fmt.Printf("X = \"%s\", Y = \"%s\"\n", X, Y)
	fmt.Printf("LCS length = %d\n", lcsLengthOptimized(X, Y))
}

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS length = 3
X = "AGGTAB", Y = "GXTXAYB"
LCS length = 4

空间优化的核心思想是用两个一维数组 prev(前一行)和 curr(当前行)替代完整的二维 dp 表。每完成一行的计算后,交换两个数组并重置 curr。由于只计算长度、不还原序列,O(min(m,n)) 的空间足够。注意当只需要计算长度时此方法适用,若需要还原 LCS 字符串则仍需完整 dp 表。


完整实现

下面给出包含 LCS 长度计算和序列还原的完整程序,并使用多个测试用例验证。

C++ 实现

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

// compute both LCS length and reconstruct the LCS string
std::pair<int, std::string> lcs(const std::string& X, const std::string& Y)
{
    int m = X.length();
    int n = Y.length();
    std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));

    // fill dp table
    for (int i = 1; i <= m; i++)
    {
        for (int j = 1; j <= n; j++)
        {
            if (X[i - 1] == Y[j - 1])
            {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            }
            else
            {
                dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
            }
        }
    }

    // backtrack to reconstruct LCS
    std::string lcsStr;
    int i = m, j = n;
    while (i > 0 && j > 0)
    {
        if (X[i - 1] == Y[j - 1])
        {
            lcsStr.push_back(X[i - 1]);
            i--;
            j--;
        }
        else if (dp[i - 1][j] > dp[i][j - 1])
        {
            i--;
        }
        else
        {
            j--;
        }
    }
    std::reverse(lcsStr.begin(), lcsStr.end());
    return {dp[m][n], lcsStr};
}

int main()
{
    std::pair<std::string, std::string> tests[] = {
        {"ABCDGH", "AEDFHR"},
        {"AGGTAB", "GXTXAYB"},
        {"ABCBDAB", "BDCABA"},
        {"", "ABC"},
        {"ABC", "ABC"}
    };

    for (auto& [X, Y] : tests)
    {
        auto [length, lcsStr] = lcs(X, Y);
        std::cout << "X = \"" << X << "\", Y = \"" << Y << "\"\n";
        std::cout << "LCS = \"" << lcsStr << "\", length = " << length << "\n";
        std::cout << "---\n";
    }
    return 0;
}

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
---
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
---
X = "ABCBDAB", Y = "BDCABA"
LCS = "BCBA", length = 4
---
X = "", Y = "ABC"
LCS = "", length = 0
---
X = "ABC", Y = "ABC"
LCS = "ABC", length = 3
---

C 实现

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int max(int a, int b)
{
    return (a > b) ? a : b;
}

typedef struct {
    int length;
    char* str;
} LcsResult;

LcsResult lcs(const char* X, const char* Y)
{
    int m = strlen(X);
    int n = strlen(Y);
    LcsResult res;

    // allocate dp table
    int** dp = (int**)malloc((m + 1) * sizeof(int*));
    for (int i = 0; i <= m; i++)
    {
        dp[i] = (int*)calloc(n + 1, sizeof(int));
    }

    // fill dp table
    for (int i = 1; i <= m; i++)
    {
        for (int j = 1; j <= n; j++)
        {
            if (X[i - 1] == Y[j - 1])
            {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            }
            else
            {
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
            }
        }
    }

    // backtrack to reconstruct LCS
    int lcsLen = dp[m][n];
    res.str = (char*)malloc((lcsLen + 1) * sizeof(char));
    res.str[lcsLen] = '\0';
    res.length = lcsLen;

    int i = m, j = n, idx = lcsLen - 1;
    while (i > 0 && j > 0)
    {
        if (X[i - 1] == Y[j - 1])
        {
            res.str[idx--] = X[i - 1];
            i--;
            j--;
        }
        else if (dp[i - 1][j] > dp[i][j - 1])
        {
            i--;
        }
        else
        {
            j--;
        }
    }

    // free dp table
    for (int i = 0; i <= m; i++)
    {
        free(dp[i]);
    }
    free(dp);

    return res;
}

int main()
{
    const char* testX[] = {"ABCDGH", "AGGTAB", "ABCBDAB", "", "ABC"};
    const char* testY[] = {"AEDFHR", "GXTXAYB", "BDCABA", "ABC", "ABC"};
    int numTests = 5;

    for (int t = 0; t < numTests; t++)
    {
        LcsResult res = lcs(testX[t], testY[t]);
        printf("X = \"%s\", Y = \"%s\"\n", testX[t], testY[t]);
        printf("LCS = \"%s\", length = %d\n", res.str, res.length);
        printf("---\n");
        free(res.str);
    }
    return 0;
}

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
---
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
---
X = "ABCBDAB", Y = "BDCABA"
LCS = "BCBA", length = 4
---
X = "", Y = "ABC"
LCS = "", length = 0
---
X = "ABC", Y = "ABC"
LCS = "ABC", length = 3
---

Python 实现

def lcs(X, Y):
    m, n = len(X), len(Y)
    # fill dp table
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    # backtrack to reconstruct LCS
    result = []
    i, j = m, n
    while i > 0 and j > 0:
        if X[i - 1] == Y[j - 1]:
            result.append(X[i - 1])
            i -= 1
            j -= 1
        elif dp[i - 1][j] > dp[i][j - 1]:
            i -= 1
        else:
            j -= 1

    lcs_str = ''.join(reversed(result))
    return dp[m][n], lcs_str

tests = [
    ("ABCDGH", "AEDFHR"),
    ("AGGTAB", "GXTXAYB"),
    ("ABCBDAB", "BDCABA"),
    ("", "ABC"),
    ("ABC", "ABC"),
]

for X, Y in tests:
    length, lcs_str = lcs(X, Y)
    print(f'X = "{X}", Y = "{Y}"')
    print(f'LCS = "{lcs_str}", length = {length}')
    print("---")

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
---
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
---
X = "ABCBDAB", Y = "BDCABA"
LCS = "BCBA", length = 4
---
X = "", Y = "ABC"
LCS = "", length = 0
---
X = "ABC", Y = "ABC"
LCS = "ABC", length = 3
---

Go 实现

package main

import "fmt"

func lcs(X, Y string) (int, string) {
	m, n := len(X), len(Y)
	// fill dp table
	dp := make([][]int, m+1)
	for i := range dp {
		dp[i] = make([]int, n+1)
	}
	for i := 1; i <= m; i++ {
		for j := 1; j <= n; j++ {
			if X[i-1] == Y[j-1] {
				dp[i][j] = dp[i-1][j-1] + 1
			} else if dp[i-1][j] > dp[i][j-1] {
				dp[i][j] = dp[i-1][j]
			} else {
				dp[i][j] = dp[i][j-1]
			}
		}
	}

	// backtrack to reconstruct LCS
	result := make([]byte, 0, dp[m][n])
	i, j := m, n
	for i > 0 && j > 0 {
		if X[i-1] == Y[j-1] {
			result = append(result, X[i-1])
			i--
			j--
		} else if dp[i-1][j] > dp[i][j-1] {
			i--
		} else {
			j--
		}
	}

	// reverse result
	for left, right := 0, len(result)-1; left < right; left, right = left+1, right-1 {
		result[left], result[right] = result[right], result[left]
	}
	return dp[m][n], string(result)
}

func main() {
	tests := []struct{ X, Y string }{
		{"ABCDGH", "AEDFHR"},
		{"AGGTAB", "GXTXAYB"},
		{"ABCBDAB", "BDCABA"},
		{"", "ABC"},
		{"ABC", "ABC"},
	}

	for _, t := range tests {
		length, lcsStr := lcs(t.X, t.Y)
		fmt.Printf("X = \"%s\", Y = \"%s\"\n", t.X, t.Y)
		fmt.Printf("LCS = \"%s\", length = %d\n", lcsStr, length)
		fmt.Println("---")
	}
}

运行该程序将输出

X = "ABCDGH", Y = "AEDFHR"
LCS = "ADH", length = 3
---
X = "AGGTAB", Y = "GXTXAYB"
LCS = "GTAB", length = 4
---
X = "ABCBDAB", Y = "BDCABA"
LCS = "BCBA", length = 4
---
X = "", Y = "ABC"
LCS = "", length = 0
---
X = "ABC", Y = "ABC"
LCS = "ABC", length = 3
---

完整实现包含了五个测试用例,覆盖了典型情况("ABCDGH"/"AEDFHR"、"AGGTAB"/"GXTXAYB")、多个解情况("ABCBDAB"/"BDCABA" 的 LCS 长度为 4,可能有 "BCBA"、"BDAB" 等多个解)、空串边界情况、以及两个字符串完全相同的情况。C 语言使用结构体 LcsResult 同时返回长度和字符串;C++ 使用 std::pair;Python 使用元组;Go 使用多返回值。


LCS的性质

复杂度对比

方法 时间复杂度 空间复杂度 能否还原序列
递归法(暴力搜索) O(2^(m+n)) O(m+n)(递归栈)
动态规划(计算长度) O(mn) O(mn)
动态规划(还原序列) O(mn) O(mn)
空间优化动态规划 O(mn) O(min(m,n))

相关问题

LCS 是字符串算法的基础问题,许多相关问题可以基于 LCS 的思路求解:

相关问题 说明
最长公共子串(Longest Common Substring) 要求公共部分在原串中连续出现。dp 转移时,若字符不匹配则 dp[i][j] = 0(而非取 max),最终取 dp 表中的最大值
最短公共超序列(Shortest Common Supersequence, SCS) 找到包含两个序列的最短序列。SCS 长度 = m + n - LCS 长度
编辑距离(Edit Distance) 将一个字符串转换为另一个所需的最少操作数。与 LCS 有关联但转移方程不同,需要考虑插入、删除、替换三种操作
最长递增子序列(Longest Increasing Subsequence, LIS) 可以转化为 LCS 问题:将序列排序后与原序列求 LCS
最长回文子序列(Longest Palindromic Subsequence) 将字符串与其反转串求 LCS

实际应用

LCS 在实际中有广泛的应用场景:

  • 版本控制工具(如 git diff):通过 LCS 比较文件不同版本的差异,找出新增、删除、修改的行
  • 生物信息学(Bioinformatics):DNA 序列和蛋白质序列的比对(Sequence Alignment),通过 LCS 找出不同物种基因序列的相似性
  • 查重与剽窃检测(Plagiarism Detection):将文档分解为句子或段落序列后,通过 LCS 比较相似度
  • 拼写校正(Spell Correction):通过 LCS 计算用户输入与词典中单词的相似度,推荐最可能的正确拼写
posted @ 2026-04-17 03:08  游翔  阅读(11)  评论(0)    收藏  举报