[LeetCode 471] Encode String with Shortest Length

Given a non-empty string, encode the string such that its encoded length is the shortest.

The encoding rule is: k[encoded_string], where the encoded_string inside the square brackets is being repeated exactly k times.

Note:

  1. k will be a positive integer.
  2. If an encoding process does not make the string shorter, then do not encode it. If there are several solutions, return any of them.

 

Example 1:

Input: s = "aaa"
Output: "aaa"
Explanation: There is no way to encode it such that it is shorter than the input string, so we do not encode it.

Example 2:

Input: s = "aaaaa"
Output: "5[a]"
Explanation: "5[a]" is shorter than "aaaaa" by 1 character.

Example 3:

Input: s = "aaaaaaaaaa"
Output: "10[a]"
Explanation: "a9[a]" or "9[a]a" are also valid solutions, both of them have the same length = 5, which is the same as "10[a]".

Example 4:

Input: s = "aabcaabcd"
Output: "2[aabc]d"
Explanation: "aabc" occurs twice, so one answer can be "2[aabc]d".

Example 5:

Input: s = "abbbabbbcabbbabbbc"
Output: "2[2[abbb]c]"
Explanation: "abbbabbbc" occurs twice, but "abbbabbbc" can also be encoded to "2[abbb]c", so one answer can be "2[2[abbb]c]".

 

Constraints:

  • 1 <= s.length <= 150
  • s consists of only lowercase English letters.

 

What is the give-away here for dynamic programming? 

Well, let's look at this top-down. To solve F(0, N - 1), there are 3 general cases:

1. S[0, N - 1] can not be encoded at all.

2. the entire string S[0, N - 1] can be encoded in format k[encoded_string]. 

3. S[0, N - 1] can be encoded as concatenation of smaller substrings' encoded strings.

 

Case 1 and 2 are pretty straightward, case 3 we need to iterate over all split index, and solve the following subproblems:

F(0,0), F(1, N - 1);

F(0,2), F(2, N - 1);

...........

F(0, N - 2), F(N - 1, N - 1)

 

In order to solve F(1, N - 1) we need to solve F(1, 2), F(2, N - 1), etc. We have overlapping subproblems! 

 

O(N^3) dynamic programming solution. 

State defintion: 

dp[i][j]: the shortest encoded string we can get from s[i, j]; 

Answer: dp[0][N - 1]

 

State transition: 

For each possible length of substrings, first assign the orginal substring to dp[i][j]; Then check if this whole substring can be encoded as k[encoded_string], (more disscussion on how later);  if we can do an entirety encoding, we are done computing for this substring. Because no smaller substrings' encoded concatenation can beat this. Otherwise, we need to loop through all split indices within [i, j - 1] and get the min length encoded string.

 

So far, it has been a pretty clear discussion, except that we have not gone over this question: how do we check if a string can be encoded as k[encoded_string]? 

If a string can be encoded in this format, then it must be periodic. We want to find the shortest repeating substring and how many times it repeats in the original string. For example, if we have aaaaaaaa, then we want to get 8[a], not 4[aa] or 2[aaaa]. It turns out that there is a very nice way of achieving this goal:

 

For a given string S, we append S to S to get SS, then we call SS.indexOf(S, 1), meaning we try to find the starting index of the first substring S in SS and we start our search from index 1, not 0. Because we just appended an entire copy of S, so there is guaranteed match. If the first matched index is >= S.length(), we know that S is not periodic thus can not be encoded for case 2. Otherwise, the length of the shortest repeating substring is this matched index idx and it repeats S.length() / idx times. We then just need to construct this encoded string and make sure that it is shorter than the orginial string S before using it to update our dp table. 

 

class Solution {
    public String encode(String s) {
        int n = s.length();
        String[][] dp = new String[n][n];
        for(int len = 1; len <= n; len++) {
            for(int start = 0; start + len <= n; start++) {
                int end = start + len - 1;
                String sub = s.substring(start, end + 1);
                String replace = "";
                dp[start][end] = sub;
                int idx = (sub + sub).indexOf(sub, 1);
                if(idx >= sub.length()) replace = sub;
                else {
                    replace = sub.length() / idx + "[" + dp[start][start + idx - 1] + "]";
                }
                if(replace.length() < dp[start][end].length()) {
                    dp[start][end] = replace;
                    continue;
                }
                for(int k = start; k < end; k++) {
                    String left = dp[start][k], right = dp[k + 1][end];
                    if(left.length() + right.length() < dp[start][end].length()) {
                        dp[start][end] = dp[start][k] + dp[k + 1][end];
                    }
                }
            }
        }
        return dp[0][n - 1];
    }
}

 

posted @ 2021-02-17 06:06  Review->Improve  阅读(52)  评论(0编辑  收藏