主题模型 LDA 源码分享
2013-06-07 20:19
381 查看
转载请注明来源:http://blog.csdn.net/yihucha166/article/details/9046835
Latent Dirichlet Allocation(LDA)是目前业界最为流行的机器学习方法之一,这里用C++实现了一个as-lda版本,使用了非对称的先验设置,随着主题数的增加,主题分布上比传统模型更加稳定,减少因为主题数量大而导致大量小众主题,参考文献《Rethinking LDA:Why Priors Matter》,代码目录中包含了中文测试数据
代码地址:https://code.google.com/p/as-lda/
asymmetric prior Latent Dirichlet Allocation (LDA) by c++
Usually, symmetric dirichlet prior is used in the implementation of lda. in "Rethinking LDA:Why Priors Matter" , they have showed that asymmetric prior can generate better result and stable topic distribution under the increment of topic number. So, in this project, we adopt this algorithm.
other features:
#easy to use, easy to understand
#small memory used
ML tools source code:
as-lda: https://code.google.com/p/as-lda/
gbdt: http://code.google.com/p/simple-gbdt/
adaboost: http://code.google.com/p/simple-adaboost/
--------how to use it-----------
Examples:
--------input format------------
For corpus:
one line one doc, the number stands for word id
example:
2699\t10608\t52656\t17781\t17781\t7900\t24007
For vocab:
one line one word,word id is the line number
Latent Dirichlet Allocation(LDA)是目前业界最为流行的机器学习方法之一,这里用C++实现了一个as-lda版本,使用了非对称的先验设置,随着主题数的增加,主题分布上比传统模型更加稳定,减少因为主题数量大而导致大量小众主题,参考文献《Rethinking LDA:Why Priors Matter》,代码目录中包含了中文测试数据
代码地址:https://code.google.com/p/as-lda/
asymmetric prior Latent Dirichlet Allocation (LDA) by c++
Usually, symmetric dirichlet prior is used in the implementation of lda. in "Rethinking LDA:Why Priors Matter" , they have showed that asymmetric prior can generate better result and stable topic distribution under the increment of topic number. So, in this project, we adopt this algorithm.
other features:
#easy to use, easy to understand
#small memory used
ML tools source code:
as-lda: https://code.google.com/p/as-lda/
gbdt: http://code.google.com/p/simple-gbdt/
adaboost: http://code.google.com/p/simple-adaboost/
--------how to use it-----------
Usage: -c corpus file,default'./corpus.txt' -v vocab file,default'./vocab.txt' -e or-i act type(e for estimate,i for inference) -m model files dir,default'./models' -z pre model assignment file ( inference ) -a hyperparameter alpha,default500/topic_num -b hyperparameter beta,default0.1 -k topic number,default100 -n max iteration number,default1000
Examples:
extimate: ./as_lda -e -c ./corpus.txt -v ./vocab.txt -n 2000 inference: ./as_lda -i -n 100 -c corpus.txt.test -v vocab.txt -z ./models/model.z
--------input format------------
For corpus:
one line one doc, the number stands for word id
example:
2699\t10608\t52656\t17781\t17781\t7900\t24007
For vocab:
one line one word,word id is the line number
相关文章推荐
- 主题模型 LDA 源码分享
- Latent Dirichlet Allocation(LDA)主题模型算法实现及源码解析
- 主题模型lda源码阅读
- Latent Dirichlet Allocation(LDA)主题模型算法实现及源码解析
- LDA主题模型学习笔记5:C源码理解
- LDA主题模型简介
- LDA主题模型介绍
- 自然语言处理-LDA主题模型
- 主题模型-LDA小结
- LDA( Latent Dirichlet Allocation)主题模型 学习报告
- LDA 主题模型(latent dirichlet allocation) 介绍
- 通俗理解LDA主题模型
- 主题模型TopicModel:LDA的缺陷和改进
- 通俗理解LDA主题模型
- LDA模型简介、源码分析及实验
- LDA主题模型简介
- 用 LDA 做主题模型:当 MLlib 邂逅 GraphX
- 主题模型LDA
- 文本主题模型之LDA(二) LDA求解之Gibbs采样算法
- LDA主题模型学习笔记4:求解模型参数(M-step)