[LeetCode]Repeated DNA Sequences

Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

首先想到的是用Hash Table判断只要O(n)的时间复杂度。但是内存超过限制,原因是String作为key占用了很大的内存。
解决方法就是二进制编码,因为只有4个字母,可以A=00,C=01,G=10,T=11。十位字母也就20位,而int有4个字节,32位,完全够用。
这样,我们就可以使用int表示一个string,节省了大量的内存。
 1 class Solution {
 2 public:
 3     vector<string> findRepeatedDnaSequences(string s) {
 4         vector<string> result;
 5         if(s.length()<=10) return result;
 6         unordered_map<int,int> showed;
 7         for(int i=0;i<=s.length()-10;i++)
 8         {
 9             string temp_str = s.substr(i,10);
10             int temp = 0;
11             for(int i=0;i<10;i++)
12             {
13                 if(temp_str[i]=='A') temp = temp*4+0;
14                 else if(temp_str[i]=='C') temp = temp*4+1;
15                 else if(temp_str[i]=='G') temp = temp*4+2;
16                 else temp = temp*4+3;
17             }
18             if(showed.find(temp)!=showed.end())
19             {
20                 if(showed[temp]==1)
21                 {
22                     result.push_back(temp_str);
23                     showed[temp]++;
24                 }
25                 else
26                 {
27                     showed[temp]++;
28                 }
29             }
30             else
31             {
32                 showed[temp]=1;
33             }
34         }
35         return result;
36     }
37 };

 

 
posted @ 2015-08-24 14:54  Sean_le  阅读(112)  评论(0编辑  收藏  举报