Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
注意到ASCII码,A is 0x41, C is 0x43, G is 0x47, T is 0x54. 用八进制表示为:
A is 0101, C is 0103, G is 0107, T is 0124. 最后一位八进制各不相同。 而一个Integer的长度为32,正好大于10×3.
然后用一点位操作就可以将sub string 和integer 联系起来。
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> results = new ArrayList<String>();
Map<Integer, Integer> map = new HashMap<Integer, Integer>();
int number = 0;
for(int i = 0; i < s.length(); i++) {
number = ((number << 3 ) | (s.charAt(i) & 0x7)) & 0x3FFFFFFF;
int occurs = map.containsKey(number) ? map.get(number) + 1 : 1;
if(occurs == 2) results.add(s.substring(i - 9, i + 1)); //避免重复
map.put(number, occurs);
}
return results;
}
}

浙公网安备 33010602011771号