Repeated DNA Sequences
2015-08-14 20:43
417 查看
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for
example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify
repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more
than once in a DNA molecule.
For example,
Java Solution
The key to solve this problem is that each of the 4 nucleotides can be stored in 2 bits.
So the 10-letter-long sequence can be converted to 20-bits-long integer.
2bits就可以区分4个不同的字符,所以20bits就可以区分10个长度的不同字符。
如果用暴力搜索,需要O(n*n),就会出现超时。
example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify
repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more
than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
Java Solution
The key to solve this problem is that each of the 4 nucleotides can be stored in 2 bits.
So the 10-letter-long sequence can be converted to 20-bits-long integer.
2bits就可以区分4个不同的字符,所以20bits就可以区分10个长度的不同字符。
如果用暴力搜索,需要O(n*n),就会出现超时。
public List<String> findRepeatedDnaSequences(String s) { List<String> result = new ArrayList<String>(); int len = s.length(); if (len < 10) return result; Map<Character, Integer> map = new HashMap<Character, Integer>(); map.put('A', 0); map.put('C', 1); map.put('G', 2); map.put('T', 3); Set<Integer> temp = new HashSet<Integer>(); Set<Integer> added = new HashSet<Integer>(); int hash = 0; for (int i = 0; i < len; i++) { if (i < 9){ //each ACGT fit 2 bits, so left shift 2 hash = (hash << 2) + map.get(s.charAt(i)); }else { hash = (hash << 2) + map.get(s.charAt(i)); //make length of hash to be 20 hash = hash & (1 << 20) - 1; if (temp.contains(hash) && !added.contains(hash)) { result.add(s.substring(i - 9, i + 1)); added.add(hash); //track added } else temp.add(hash); } } return result; }
相关文章推荐
- hdu 1242 Rescue (BFS+优先队列)
- 【转载】UIScrollView
- HDU 2454 Degree Sequence of Graph G(Havel定理 推断一个简单图的存在)
- J - Guilty Prince
- Codeforces 570D TREE REQUESTS dfs序+树状数组
- 7 Types of Regression Techniques you should know!
- Hadoop之SequenceFile
- hdu - 2667 Proving Equivalences(强连通)
- ListView中requestLayout执行流程解析
- OC - NSValueAndNSNumber
- UIView如何管理它的子视图
- IOS UIGraphicsBeginImageContext 使用简介
- iOS arc与非arc混编 以及设置UINavigationBar的高度
- UI-父视图和子视图之间的关系
- ios UISlider总结
- Django1.8:403错误:CSRF verification failed. Request aborted.
- 在AWS上为1.25亿用户实现移动应用个性化
- UINavigationController小知识点
- DNS服务器性能测试(基于queryperf)
- iOS UIView的setNeedsLayout, layoutIfNeeded 和 layoutSubviews 方法之间的关系解释