187. Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
Subscribe to see which companies asked this question
class Solution { public List<String> findRepeatedDnaSequences(String s) { int N = 10; List<String> ret = new ArrayList<>(); if (s.length() < N + 1) return ret; // Two hashsets provide a better performance than // a map that keeps the count of word. Set<Integer> firstAppearance = new HashSet<>(); Set<Integer> secondAppearance = new HashSet<>(); char[] map = new char[26]; //map['A' - 'A'] = 0; map['C' - 'A'] = 1; map['G' - 'A'] = 2; map['T' - 'A'] = 3; int v = 0; int mask = 3; for (int i = 0; i < N - 1; ++i) { v <<= 2; v |= map[s.charAt(i) - 'A']; //same as v += ... mask <<= 2; mask |= 3; } for (int i = N - 1; i < s.length(); ++i) { v <<= 2; v |= map[s.charAt(i) - 'A']; //same as v += ... v &= mask; //only keep the last 20 bits. if (!firstAppearance.add(v) && secondAppearance.add(v)) { ret.add(s.substring(i - N + 1, i + 1)); } } return ret; } }

浙公网安备 33010602011771号