Find Duplicate File in System
Given a list of files information including their directory, filename and content. Find the files with duplicated content and return the paths of them.
e.g:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"] ->
[["root/a/2.txt", "root/c/d/4.txt", "root/4.txt"], ["root/a/1.txt", "root/c/3.txt"]]
The files infor will be the form of the example and the result should also be the same form as the example.
Solution:
1. We need a hash table to determine duplicates. If we use hash set, we cannot store the directories of each content. For convinience, we use a hash map. The mapping is content -> directories(including file name) which is Stirng -> Set<String>. We we finish scanning the whole array, the key with more one element in their value set will be the duplicated ones.
2. for each string, split it with spaces which is s+ in regex representation and the first string is the directory, and the following strings will be file info.
3. for each file info, we find the index of "(" and put the file name with the directory as filename, extract the content as content.
4. add the mapping of content and filename to the hashmap.
5. for every key in the keyset, find the value with more than 1 element and put it in the result set.
6. return the result set.
Code:
public class Solution { public List<List<String>> findDuplicate(String[] paths) { /* Input: ["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"] Output: [["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]] */ List<List<String>> res = new ArrayList<>(); if(paths.length==0) return res; Map<String, Set<String>> hm = new HashMap<>(); for(String p : paths){ String[] arr = p.split("\\s+"); for(int i = 1; i < arr.length; i++){ int index = arr[i].indexOf("("); String content = arr[i].substring(index+1, arr[i].length()-1); String filename = arr[0]+"/"+arr[i].substring(0, index); if(!hm.containsKey(content)) hm.put(content, new HashSet<String>()); hm.get(content).add(filename); } } for(String s : hm.keySet()){ if(hm.get(s).size()>1){ res.add(new ArrayList<String>(hm.get(s))); } } return res; } }

浙公网安备 33010602011771号