您的位置:首页 > 编程语言 > C语言/C++

Stochhmm:一个集成多种方法的隐马尔科夫C++项目

2014-05-24 07:15 1326 查看
https://code.google.com/p/stochhmm/

StochHMM is a free, easy to use, open source C++ Library and application that implements HMM from simple text files. It implements traditional HMM algorithms in addition it providing additional flexibility. The additional flexibility is achieved by allowing
researchers to integrate additional data sources and application into the HMM framework.

StochHMM implements standard HMM, (Preliminary) HMM with duration. It grants researchers the power to integrate additional datasets in their HMM to improve predictions. Finally, it adapts HMM algorithms to provide stochastic decoding giving researchers the
ability to explore and rank sub-optimal predictions. We are providing StochHMM as a standalone application and C++ library to give researchers the ability to rapidly develop HMMs.

Integrating Data

Here are a few of the ways that StochHMM allows the users to integrate additional data sources:
Multiple Emission States
Weighting or Explicitly Defining State paths on a sequence
Linking States Emissions/Transitions to external user-defined functions


Multiple Emission States

StochHMM allows the user to provide multiple sequences. These sequences are then handled by the emissions. These sequences can be REAL numbers or discrete characters/words. StochHMM allows each state to have many emissions (Discrete or Continuous). Discrete
emissions can be independent of each other or joint distributions. The continuous emissions can be considered in multiple ways. 1) They can be considered as raw probabilities which will be integrated without transformation. 2) They can be considered as values
to be plugged into a Univariate Probability Distribution Function or Multivariate PDF (In the case of multiple REAL sequences.

Each states emissions are user-defined, so one state may have emissions from two different sequences, while another may only have a single emission from a single sequence.


Weighting or Explicitly Defining State paths to follow on a sequence.

Often, we have some prior knowledge about the sequence. If this is the case, we may want to integrate that into the model, without redesigning or retraining the model (a timely endeavor). StochHMM allows the user to explicitly define a State path (By name of
state, or category of state). In addition, StochHMM also allows the user to weight a states path (By name of State or category of state defined by user) This allows the user to restrict the predicted path or weight their prior knowledge.


Linking States Emissions or Transitions to external user-defined functions

When that transition/emission is evaluated the function is called and can provide an emission. While this may provide one way of addressing a weakness of HMMs, which is that they do not handle long range dependencies. We see it rather as a way to link together
existing utilities or functions that provide additional information to the decoding algorithms. In this way, we can link divergent datasets or functions within the HMM trellis in order to arrive at a better prediction.


Features

Brief list of features implemented in StochHMM:
General settings within Hidden Markov Models
User-defined HMM model via simple human readable text file
User-defined Alphabet
User-defined Ambiguous Characters

States
Emissions
Multiple emission states (Discrete / Continuous)
Independent (Single or Multiple Discrete)
Joint Distribution (Multiple Discrete)
Univariate PDF (Single Sequence - Continuous)
Multivariate PDF (Multiple Sequence - Continuous)

Linkable to user-defined function

Transitions
Standard Transitions
Lexical Transitions (Single or multiple emission)
(Preliminary) Explicit Duration Transitions
Linkable to user-defined functions

Decoding
Traditional Decoding Algorithms
Forward/Backward/Posterior
Viterbi
N-best Viterbi

Stochastic Sampling Decoding Algorithms
Stochastic Forward
Stochastic Viterbi
Stochastic Posterior

Decoding Traceback Path output formats
State Path Index
State Path Label
GFF
Hit Table (Stochastic Algorithms)


Developed by:

Korf Lab Genome Center, University of California, Davis
Ian Korf
Paul Lott
Keith Dunaway
Ken Yu

For suggestions or support:

korflab AT ucdavis DOT edu
KorfLab Github
Google Groups
StochHMM-dev
StochHMM-Forum


References

1. Schroeder, D.I., Blair J.D., Lott P., Yu H.O., Hong D., Crary F., Ashwood P., Walker C. , Korf I., Robinson W.P., LaSalle J.M. The human placenta methylome. PNAS 15:6037-6042 (2013)

2. Lott, P., Dunaway, K., Yu, K., Korf, I. StochHMM: A Flexible Hidden Markov Model Framework for Rapid Development of HMMs. Poster presented at: Genome Informatics, 2012 Sep 6-9, Cambridge, UK.

3. Schroeder, D. I., Lott, P., Korf, I., LaSalle, J. M. Large-scale methylation domains mark a functional subset of neuronally expressed genes. Genome Res 21, 1583–1591 (2011).

4. Ginno, P. A., Lott, P. L., Christensen, H. C., Korf, I., Chédin, F. R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell 45, 814–825 (2012).


Documentation

Code Documentation can be found at http://korflab.github.io/StochHMM

Model file documentation and additional support can found at https://github.com/KorfLab/StochHMM/wiki

StochHMM is provided as free open source code and compiles on Windows, Mac OSX, and Linux. We are providing StochHMM under the MIT open source license to increase accessibility and to give researchers the ability to use it in derivative works without restrictions.

Please feel free to contact us with Bugs, Suggestions, or Questions. lottpaul@gmail.com
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐