Stochhmm:一个集成多种方法的隐马尔科夫C++项目
2014-05-24 07:15
1326 查看
https://code.google.com/p/stochhmm/
StochHMM is a free, easy to use, open source C++ Library and application that implements HMM from simple text files. It implements traditional HMM algorithms in addition it providing additional flexibility. The additional flexibility is achieved by allowing
researchers to integrate additional data sources and application into the HMM framework.
StochHMM implements standard HMM, (Preliminary) HMM with duration. It grants researchers the power to integrate additional datasets in their HMM to improve predictions. Finally, it adapts HMM algorithms to provide stochastic decoding giving researchers the
ability to explore and rank sub-optimal predictions. We are providing StochHMM as a standalone application and C++ library to give researchers the ability to rapidly develop HMMs.
Multiple Emission States
Weighting or Explicitly Defining State paths on a sequence
Linking States Emissions/Transitions to external user-defined functions
StochHMM allows the user to provide multiple sequences. These sequences are then handled by the emissions. These sequences can be REAL numbers or discrete characters/words. StochHMM allows each state to have many emissions (Discrete or Continuous). Discrete
emissions can be independent of each other or joint distributions. The continuous emissions can be considered in multiple ways. 1) They can be considered as raw probabilities which will be integrated without transformation. 2) They can be considered as values
to be plugged into a Univariate Probability Distribution Function or Multivariate PDF (In the case of multiple REAL sequences.
Each states emissions are user-defined, so one state may have emissions from two different sequences, while another may only have a single emission from a single sequence.
Often, we have some prior knowledge about the sequence. If this is the case, we may want to integrate that into the model, without redesigning or retraining the model (a timely endeavor). StochHMM allows the user to explicitly define a State path (By name of
state, or category of state). In addition, StochHMM also allows the user to weight a states path (By name of State or category of state defined by user) This allows the user to restrict the predicted path or weight their prior knowledge.
When that transition/emission is evaluated the function is called and can provide an emission. While this may provide one way of addressing a weakness of HMMs, which is that they do not handle long range dependencies. We see it rather as a way to link together
existing utilities or functions that provide additional information to the decoding algorithms. In this way, we can link divergent datasets or functions within the HMM trellis in order to arrive at a better prediction.
Brief list of features implemented in StochHMM:
General settings within Hidden Markov Models
User-defined HMM model via simple human readable text file
User-defined Alphabet
User-defined Ambiguous Characters
States
Emissions
Multiple emission states (Discrete / Continuous)
Independent (Single or Multiple Discrete)
Joint Distribution (Multiple Discrete)
Univariate PDF (Single Sequence - Continuous)
Multivariate PDF (Multiple Sequence - Continuous)
Linkable to user-defined function
Transitions
Standard Transitions
Lexical Transitions (Single or multiple emission)
(Preliminary) Explicit Duration Transitions
Linkable to user-defined functions
Decoding
Traditional Decoding Algorithms
Forward/Backward/Posterior
Viterbi
N-best Viterbi
Stochastic Sampling Decoding Algorithms
Stochastic Forward
Stochastic Viterbi
Stochastic Posterior
Decoding Traceback Path output formats
State Path Index
State Path Label
GFF
Hit Table (Stochastic Algorithms)
Korf Lab Genome Center, University of California, Davis
Ian Korf
Paul Lott
Keith Dunaway
Ken Yu
KorfLab Github
Google Groups
StochHMM-dev
StochHMM-Forum
1. Schroeder, D.I., Blair J.D., Lott P., Yu H.O., Hong D., Crary F., Ashwood P., Walker C. , Korf I., Robinson W.P., LaSalle J.M. The human placenta methylome. PNAS 15:6037-6042 (2013)
2. Lott, P., Dunaway, K., Yu, K., Korf, I. StochHMM: A Flexible Hidden Markov Model Framework for Rapid Development of HMMs. Poster presented at: Genome Informatics, 2012 Sep 6-9, Cambridge, UK.
3. Schroeder, D. I., Lott, P., Korf, I., LaSalle, J. M. Large-scale methylation domains mark a functional subset of neuronally expressed genes. Genome Res 21, 1583–1591 (2011).
4. Ginno, P. A., Lott, P. L., Christensen, H. C., Korf, I., Chédin, F. R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell 45, 814–825 (2012).
Code Documentation can be found at http://korflab.github.io/StochHMM
Model file documentation and additional support can found at https://github.com/KorfLab/StochHMM/wiki
StochHMM is provided as free open source code and compiles on Windows, Mac OSX, and Linux. We are providing StochHMM under the MIT open source license to increase accessibility and to give researchers the ability to use it in derivative works without restrictions.
Please feel free to contact us with Bugs, Suggestions, or Questions. lottpaul@gmail.com
StochHMM is a free, easy to use, open source C++ Library and application that implements HMM from simple text files. It implements traditional HMM algorithms in addition it providing additional flexibility. The additional flexibility is achieved by allowing
researchers to integrate additional data sources and application into the HMM framework.
StochHMM implements standard HMM, (Preliminary) HMM with duration. It grants researchers the power to integrate additional datasets in their HMM to improve predictions. Finally, it adapts HMM algorithms to provide stochastic decoding giving researchers the
ability to explore and rank sub-optimal predictions. We are providing StochHMM as a standalone application and C++ library to give researchers the ability to rapidly develop HMMs.
Integrating Data
Here are a few of the ways that StochHMM allows the users to integrate additional data sources:Multiple Emission States
Weighting or Explicitly Defining State paths on a sequence
Linking States Emissions/Transitions to external user-defined functions
Multiple Emission States
StochHMM allows the user to provide multiple sequences. These sequences are then handled by the emissions. These sequences can be REAL numbers or discrete characters/words. StochHMM allows each state to have many emissions (Discrete or Continuous). Discreteemissions can be independent of each other or joint distributions. The continuous emissions can be considered in multiple ways. 1) They can be considered as raw probabilities which will be integrated without transformation. 2) They can be considered as values
to be plugged into a Univariate Probability Distribution Function or Multivariate PDF (In the case of multiple REAL sequences.
Each states emissions are user-defined, so one state may have emissions from two different sequences, while another may only have a single emission from a single sequence.
Weighting or Explicitly Defining State paths to follow on a sequence.
Often, we have some prior knowledge about the sequence. If this is the case, we may want to integrate that into the model, without redesigning or retraining the model (a timely endeavor). StochHMM allows the user to explicitly define a State path (By name ofstate, or category of state). In addition, StochHMM also allows the user to weight a states path (By name of State or category of state defined by user) This allows the user to restrict the predicted path or weight their prior knowledge.
Linking States Emissions or Transitions to external user-defined functions
When that transition/emission is evaluated the function is called and can provide an emission. While this may provide one way of addressing a weakness of HMMs, which is that they do not handle long range dependencies. We see it rather as a way to link togetherexisting utilities or functions that provide additional information to the decoding algorithms. In this way, we can link divergent datasets or functions within the HMM trellis in order to arrive at a better prediction.
Features
Brief list of features implemented in StochHMM:General settings within Hidden Markov Models
User-defined HMM model via simple human readable text file
User-defined Alphabet
User-defined Ambiguous Characters
States
Emissions
Multiple emission states (Discrete / Continuous)
Independent (Single or Multiple Discrete)
Joint Distribution (Multiple Discrete)
Univariate PDF (Single Sequence - Continuous)
Multivariate PDF (Multiple Sequence - Continuous)
Linkable to user-defined function
Transitions
Standard Transitions
Lexical Transitions (Single or multiple emission)
(Preliminary) Explicit Duration Transitions
Linkable to user-defined functions
Decoding
Traditional Decoding Algorithms
Forward/Backward/Posterior
Viterbi
N-best Viterbi
Stochastic Sampling Decoding Algorithms
Stochastic Forward
Stochastic Viterbi
Stochastic Posterior
Decoding Traceback Path output formats
State Path Index
State Path Label
GFF
Hit Table (Stochastic Algorithms)
Developed by:
Korf Lab Genome Center, University of California, DavisIan Korf
Paul Lott
Keith Dunaway
Ken Yu
For suggestions or support:
korflab AT ucdavis DOT eduKorfLab Github
Google Groups
StochHMM-dev
StochHMM-Forum
References
1. Schroeder, D.I., Blair J.D., Lott P., Yu H.O., Hong D., Crary F., Ashwood P., Walker C. , Korf I., Robinson W.P., LaSalle J.M. The human placenta methylome. PNAS 15:6037-6042 (2013)2. Lott, P., Dunaway, K., Yu, K., Korf, I. StochHMM: A Flexible Hidden Markov Model Framework for Rapid Development of HMMs. Poster presented at: Genome Informatics, 2012 Sep 6-9, Cambridge, UK.
3. Schroeder, D. I., Lott, P., Korf, I., LaSalle, J. M. Large-scale methylation domains mark a functional subset of neuronally expressed genes. Genome Res 21, 1583–1591 (2011).
4. Ginno, P. A., Lott, P. L., Christensen, H. C., Korf, I., Chédin, F. R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell 45, 814–825 (2012).
Documentation
Code Documentation can be found at http://korflab.github.io/StochHMMModel file documentation and additional support can found at https://github.com/KorfLab/StochHMM/wiki
StochHMM is provided as free open source code and compiles on Windows, Mac OSX, and Linux. We are providing StochHMM under the MIT open source license to increase accessibility and to give researchers the ability to use it in derivative works without restrictions.
Please feel free to contact us with Bugs, Suggestions, or Questions. lottpaul@gmail.com
相关文章推荐
- C++语言基础 例程 案例:一个接口,多种方法
- c++中 dialog、bitmap等资源,由一个项目完全复制到另一个项目的方法
- 安装intel c++8.0后导致vc6编译正常的项目链接失败的一个解决方法
- C++中提供了多种基本的数据类型。实际上,这些远不能满足我们的需求,如复数(第10章的例子大多是处理虚数的),再如分数。本任务将设计一个简单的分数类,完成对分数的几个运算。一则巩固基于对象编程的方法,
- C++开发中一个解决方案里,两个项目的相互引用,相互依赖的实现方法(解决方法)
- 项目调试时候,出现其中用到的一个组件“访问被拒绝”的解决方法(.net的一个BUG)
- 项目调试时候,出现其中用到的一个组件“访问被拒绝”的解决方法
- 项目调试时候,出现其中用到的一个组件“访问被拒绝”的解决方法(摘自博客堂)
- 项目调试时候,出现其中用到的一个组件“访问被拒绝”的解决方法
- 一个小的WEB项目中的实现方法讨论
- 研究一个开源项目的方法
- 将一个新的web项目作为一个DNN的模块添加到DNN里的方法:
- 老外对T-sql的研究:一个问题多种方法
- 项目调试时出现用到的一个组件“访问被拒绝”的解决方法(转载)
- 判断一个字符串是否全是数字的多种方法及其性能比较(C#实现)
- 字符转成网页Unicode码的方法,可用于在一个网页上显示多种语种.
- 判断一个字符串是否全是数字的多种方法及其性能比较(C#实现)
- 把一个大的WEB项目拆分成数个小项目。而且不受虚拟目录影响的方法
- 判断一个字符串是否全是数字的多种方法及其性能比较(C#实现)
- 项目调试时候,出现其中用到的一个组件“访问被拒绝”的解决方法。(转载)