# [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列

The DNA sequence is composed of a series of nucleotides abbreviated as 'A''C''G', and 'T'.

• For example, "ACGAATTCCG" is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Example 1:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]


Example 2:

Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]


Constraints:

• 1 <= s.length <= 105
• s[i] is either 'A''C''G', or 'T'.

A: 0100 0001　　C: 0100 0011　　G: 0100 0111　　T: 0101 0100

class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
vector<string> res;
if (s.size() <= 10) return res;
int mask = 0x7ffffff, cur = 0;
unordered_map<int, int> m;
for (int i = 0; i < 9; ++i) {
cur = (cur << 3) | (s[i] & 7);
}
for (int i = 9; i < s.size(); ++i) {
cur = ((cur & mask) << 3) | (s[i] & 7);
if (m.count(cur)) {
if (m[cur] == 1) res.push_back(s.substr(i - 9, 10));
++m[cur];
} else {
m[cur] = 1;
}
}
return res;
}
};

class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
unordered_set<string> res;
unordered_set<int> st;
int cur = 0;
for (int i = 0; i < 9; ++i) cur = cur << 3 | (s[i] & 7);
for (int i = 9; i < s.size(); ++i) {
cur = ((cur & 0x7ffffff) << 3) | (s[i] & 7);
if (st.count(cur)) res.insert(s.substr(i - 9, 10));
else st.insert(cur);
}
return vector<string>(res.begin(), res.end());
}
};

class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
unordered_set<string> res;
unordered_set<int> st;
unordered_map<int, int> m{{'A', 0}, {'C', 1}, {'G', 2}, {'T', 3}};
int cur = 0;
for (int i = 0; i < 9; ++i) cur = cur << 2 | m[s[i]];
for (int i = 9; i < s.size(); ++i) {
cur = ((cur & 0x3ffff) << 2) | (m[s[i]]);
if (st.count(cur)) res.insert(s.substr(i - 9, 10));
else st.insert(cur);
}
return vector<string>(res.begin(), res.end());
}
};

class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
unordered_set<string> res, st;
for (int i = 0; i + 9 < s.size(); ++i) {
string t = s.substr(i, 10);
if (st.count(t)) res.insert(t);
else st.insert(t);
}
return vector<string>{res.begin(), res.end()};
}
};

Github 同步地址：

https://github.com/grandyang/leetcode/issues/187

https://leetcode.com/problems/repeated-dna-sequences/

https://leetcode.com/problems/repeated-dna-sequences/discuss/53855/7-lines-simple-java-on

https://leetcode.com/problems/repeated-dna-sequences/discuss/53877/i-did-it-in-10-lines-of-c

https://leetcode.com/problems/repeated-dna-sequences/discuss/53867/clean-java-solution-hashmap-bits-manipulation

LeetCode All in One 题目讲解汇总(持续更新中...)

 微信打赏 Venmo 打赏
posted @ 2015-02-10 16:11  Grandyang  阅读(16621)  评论(17编辑  收藏  举报