word2vec实践(一):预备知识
2015-08-10 22:54
411 查看
word2vec是google最新发布的深度学习工具,它利用神经网络将单词映射到低维连续实数空间,又称为单词嵌入。词与词之间的语义相似度可以通过两个单词的嵌入向量之间的余弦夹角直接衡量,更不用说使用诸如kmeans、层次聚类这样的算法来挖掘其功能了,同时作者Tomas Mikolov发现了比较有趣的现象,就是单词经过分布式表示后,向量之间依旧保持一定的语法规则,比如简单的加减法规则。
目前网络上有大量的实践文章和理论分析文章。主要列举如下:
理论分析文章: Deep Learning实战之word2vec
Deep Learning in NLP (一)词向量和语言模型
word2vec傻瓜剖析
word2vec学习+使用介绍
实践部分:
利用中文数据跑Google开源项目word2vec分词工具ANSJ(实例)Word2vec在事件挖掘中的调研参考文献:
[1] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.[2] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.[3] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.[4] Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. Extensions of recurrent neural network language model. In Acoustics, Speech and Signal Processing (ICASSP), 2011, IEEE International Conference on, pages 5528–5531. IEEE, 2011.[5] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ICLR Workshop, 2013.[6] Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics, pages 246–252, 2005.[7] Andriy Mnih and Geoffrey E Hinton. A scalable hierarchical distributed language model. Advances in neural information processing systems, 21:1081–1088,2009.[8] Hinton, Geoffrey E. "Learning distributed representations of concepts." Proceedings of the eighth annual conference of the cognitive science society. 1986.[9] R. Rosenfeld, "Two decades of statistical language modeling: where do we go from here?", Proceedings of the IEEE, 88(8), 1270-1288, 2000.[10] Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. "Large Scale Distributed Deep Networks". Proceedings of NIPS, 2012.[11] http://licstar.net/archives/328 [12] http://www.cs.columbia.edu/~mcollins/loglinear.pdf [13] A. Mnihand G. Hinton. Three new graphical models for statistical language modelling. Proceedings of the 24th international conference on Machine learning, pages 641–648, 2007[14] Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model.In Robert G. Cowell and Zoubin Ghahramani, editors, AISTATS’05,
相关文章推荐
- 15-08-常用对象API(String类-intern方法)
- 朴素贝叶斯
- ios设置UILabel中文字的不同颜色和字体字号
- Asp.net mvc web api 在项目中的实际应用
- ubuntu的系统设置不见了怎么找回
- Andrew Ng Machine Learning 专题【Logistic Regression & Regularization】
- 第82讲:Scala中List的ListBuffer是如何实现高效的遍历计算的?学习笔记
- UVALive 7043 International Collegiate Routing Contest(字典树)
- word2vec学习+使用介绍
- 最最实用的android studio设置教程
- 15-06-常用对象API(String类-常见功能-判断)
- 15-07-常用对象API(String类-常见功能-比较)
- mysql could not be resolved: Name or service not known
- MFC程序使用控制台
- UIViewController加载显示过程
- Android与iOS之static关键字异同
- 左孩子右兄弟树的递归与非递归、深度与广度遍历
- 15-05-常用对象API(String类-常见功能-转换)
- hdu 5357
- hdu 5357 2015-08-10 22:52 7人阅读 评论(0) 收藏