[每日一题] [力扣 583] [leetcode 583] 两个字符串的删除操作 2021.9.25

题目描述

给定两个单词 word1 和 word2，找到使得 word1 和 word2 相同所需的最小步数，每步可以删除任意一个字符串中的一个字符。

示例：

输入: "sea", "eat"
输出: 2
解释: 第一步将"sea"变为"ea"，第二步将"eat"变为"ea"

提示：

给定单词的长度不超过500。
给定单词中的字符只含有小写字母。

来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/delete-operation-for-two-strings
著作权归领扣网络所有。商业转载请联系官方授权，非商业转载请注明出处。

思路

字符串的题目不是太好想，而测试的用例往往给的都过于简单。所以自己要想出一些例外的情况再分析。
题目中说，可以删除任意一个字符串中的一个字符，所以操作就是通过每一步删除其中一个串的一个字符来达到两个字符串一致。

🤔可能的方法：

Trie树？
回溯匹配所有的可能情况？
尽量使用可以匹配出更多字符的策略？
动态规划
可是处理的子问题是什么呢？

没有输出结果的题目不需要实际地去“删除”，换言之，题目要求选择出尽量多的匹配的字符。
然后再使用总字符数 - 匹配的字符 * 2 即可。

先试试3

大概有如下思路：

# 伪代码
class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        def test(word1,word2) -> int:
            total = len(word1) + len(word2)
            same_char_size = 0
            last_found_index = 0
            for i in range(0,len(word1)):
                for j in range(last_found_index,len(word2)):
                    if word1[i] == word2[j]:
                        same_char_size += 1
                        last_found_index = j + 1
                        break
            return (total - same_char_size*2)
        return (min(test(word1,word2),test(word2,word1)))

尝试1

这样的话，测试用例：
"intention"
"execution"
解答错误。

画图看看

有尝试2代码

尝试2


class Solution {
public:
    int minDistance(string word1, string word2) {
        std::map<int, std::vector<int>> II_map;
        for (int i = 0; i < word2.size(); i++) {
            II_map[word2[i]].push_back(i);
        }

        std::size_t total  = word1.size() + word2.size();
        int keep_char_size = 0;

        int I_start  = 0;
        int II_start = 0;
        while (I_start < word1.size() && II_start < word2.size()) {
            // 遍历所有的I字符，到II中找对应的字符位置，然后找到两个位置的和最小的那个
            std::vector<std::pair<int, int>> found_indexes;
            for (int i = I_start; i < word1.size(); i++) {
                char ch = word1[i];
                if (II_map.count(ch) == 0 || II_map[ch].empty())
                    continue;
                auto &II_index_vec = II_map[ch];
                // 在II中找到的char中跳过已经使用过的，即小于II_start的index
                int II_try_next_index = -1;
                for (int i = 0; i < II_index_vec.size(); i++) {
                    if (II_index_vec[i] >= II_start) {
                        II_try_next_index = II_index_vec[i];
                        break;
                    }
                }
                if (II_try_next_index == -1) {
                    continue;
                }
                found_indexes.push_back(std::make_pair(i, II_try_next_index));
            }

            if (found_indexes.empty()) {
                break;
            }
            //取所有pair中和最小的？3，1的Index与2，2究竟哪个更好一些？
            // 由于 max(3,1) = 3 取的其中一个串的3
            //      max(2,2) = 2 取的是其中一个串的2 ，所以选择2
            auto min_sum = std::min_element(
                found_indexes.begin(), found_indexes.end(),
                [](const std::pair<int, int> &a, const std::pair<int, int> &b) {
                    return std::max(a.first, a.second) < std::max(b.first, b.second);
                });
            if (min_sum != found_indexes.end()) {
                I_start  = min_sum->first + 1;
                II_start = min_sum->second + 1;
                keep_char_size++;
                printf("%c %c\n", word1[I_start - 1], word2[II_start - 1]);
            }
        }
        printf("%d\n", total);
        return (total - keep_char_size * 2);
    }
};

遇到测试用例
"dinitrophenylhydrazine"
"phenylhydrazine"
失败 😭

尝试3

看来以往的思路不能解决问题了。
我就是逊啦！ 😭😭😭

题解中说可以使用LCS来解决问题。

题解

首先看这篇

LCS的题解1
这里carl哥说使用DP来解决问题。

看了一下不太明白题解不是很明白为什么要定义这样的递推公式。
递推公式是怎样想来的？
为什么DP可以解决我之前的遇到的问题？

闲话
往往DP问题的数组意义都是题目中要求的结果。
比如LCS题定义了 DP意义为最长公共子序列。
比如Leetcode 96 不同的二叉搜索树定义DP为不同的BST的数量。

然后再看这篇

LCS的题解2
明白了两种情况的意义。

如果 text1[i] == text2[j] 那么则把问题转化为了 结果 = dp[i-1][j-1] 的LCS上 + 1
如果不相等呢则 dp[i][j] = std::max( dp[i-1][j], dp[i][j-1) 给到此dp，此dp继承之前的值。

闲话
状态转移总是伴随在原来的值 + xxx 或者继承原来的值呢。（废话！化解为子问题就是这样的啦！）

如何定义DP
那DP可以定义为 int dp[i+1][j+1];
状态转移公式

if text1[i] == text2[j]:
    dp[i][j] = dp[i-1][j-1] + 1
else :
    dp[i][j] = max(dp[i-1][j], dp[i][j-1])

如何初始化？
由于i，j是下标从1开始计算，所以有dp[i][0] = 0, dp[0][j] = 0 , dp[0][0] = 0
如何遍历
由于 dp[i][j] 需要使用到 dp[i-1][j] 的值，所以使用从小到大进行遍历。
🤔

for i in range(1,len(word1)+1):
    for j in range(1,len(word2)+1):
        if word1[i-1] == word2[j-1]:
            dp[i][j] = dp[i-1][j-1] + 1
        else:
            dp[i][j] = max(dp[i-1][j], dp[i][j-1])

LCS

class Solution {
public:
    int longestCommonSubsequence(string text1, string text2) {
        std::vector<std::vector<int>> dp(text1.size() + 1, std::vector<int>(text2.size() + 1, 0));
        for (int i = 1; i < text1.size() + 1; i++) {
            for (int j = 1; j < text2.size() + 1; j++) {
                if (text1[i-1] == text2[j-1]) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }
        return dp[text1.size()][text2.size()];
    }
};

所以此题为：

class Solution {
public:
       int longestCommonSubsequence(string text1, string text2) {
        std::vector<std::vector<int>> dp(text1.size() + 1, std::vector<int>(text2.size() + 1, 0));
        for (int i = 1; i < text1.size() + 1; i++) {
            for (int j = 1; j < text2.size() + 1; j++) {
                if (text1[i - 1] == text2[j - 1]) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }
        return dp[text1.size()][text2.size()];
    }

    int minDistance(string word1, string word2) {
        int total = word1.size() + word2.size();
        return total - longestCommonSubsequence(word1, word2) * 2;
    }
};

动态规划为什么能解决这个问题？

🤔把思维上的动态规划的过程展开树状结构看看。

画图工具graphviz 参考

可以看到

会有很多重复的节点计算
有类似

    for i in range(0, len(text1)):
         for j in range(0, len(text2)):
               if XXX

的结构出现。
区别于我之前的代码在于，这期间会求得哪边的子序列会比较长。

动态两个字体现在哪里？

参考链接

The reason it is called dynamic programming is actually quite complicated. Dynamic here refers to the fact that you maintain a dynamic list or table in order to speed up the operation you are …
… I'm just screwing around. It’s called dynamic programming for one very stupid reason. Richard Bellman, a mathematician in the 1950s, was tasked with coming up with a name for what eventually came to be Dynamic Programming. He worked for the RAND corporation, who did research for the Air Force and answered to the secretary of defense. According to Bellman, Charles Wilson, the secretary of defense, hated the words research and mathematics. I don't mean like in a “leave me alone” kind of way. I'm talking red in the face, punches will get thrown, cursing out kind of hatred. Basically the words “mathematical” and “research” were to Wilson what “spinach” and “broccoli” are to a toddler.
Now, Bellman, being a mathematician who does research, struggled to come up with a name that would convince Wilson to fund him and RAND. He couldn't use the word “mathematics” or anything related to it, lest Wilson breaks into a cold sweat and locks him self into a room and refuses to come out till it left. So he decided to use the term Dynamic Programming.
Seriously, he chose the term dynamic because it refers to the time-vary aspect of the problem, and because it sounds impressive. So impressive in fact, that it’s “something not even a Congressman could object to.” (That's an actual Bellman quote.)

动态规划可以解决哪些相似问题？

可以化解为子问题的问题

LCS题目把一系列比对的过程全部化成为一系列的子问题，即
看text1[i] 与 text2[j] 是否相等，如果不相等，则去获取 dp[i-1][j] 与 dp[i][j-1] 两种情况的LCS的较大值。

所以往往定义的dp数组的值的意义是直接题目要求求的东西。

posted @ 2021-09-25 11:14 zh30 阅读(272) 评论(0) 收藏举报

刷新页面返回顶部

zh30