NLTK vs Sklearn vs Gensim
2016-04-05 16:43
489 查看
NLTK、SKlearn和Gensim使用场景
引用quora上的回答:Yuval Feinstein的回答:
Generally,
- NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)
- Sklearn is used primarily for machine learning (classification, clustering, etc.)
- Gensim is used primarily for topic modeling and document similarity.
Roland Bischof的回答:
- NLTK is specialized on gathering and classifying unstructured texts. If you need e.g. a POS-tagger, lematizer, dependeny-analyzer, etc, you’ll find them there, and sometimes nowhere else. It offers a quit broad range of tools developped mainly in academic research. But: most often it is not very well optimized - involving NLTK libraries often means to accept a huge performance loss. If you do text-gathering or -preprocessing, its fine to begin with - until you found some faster alternatives.
-SKLEARN is a much more an analyzing tool, rather than an gathering tool. Its greatly documented, well optimized, and covers a broad range of statistical methods.
-GENSIM is a very well optimized, but also highly specialized, library for doing jobs in the periphery of “WORD2DOC”. That is: it offers an easy and surpringly well working and swift AI-approach to unstructured texts. If you are interested in prodution, you might also have a look on TensorFlow, which offers a mathematically generalized, yet highly performant, model.
Although considerably overlapping, I personnaly prefer using NLTK for pre-processing, GENSIM as kind of base platform, and SKLEARN for third step processing issues.
相关文章推荐
- 在centOS上离安装Python2.7以及numpy,scipy,matplot,sklearn等
- 长期招聘:自然语言处理工程师
- 使用Stanford Word Segmenter and Stanford Named Entity Recognizer (NER)实现英文命名实体识别
- 2014.6.30面百度NLP部实习
- MMSEG中文分词算法
- gensim学习笔记之基本概念
- python几个工具包的安装
- Standford NLP Course(2) - Edit Distance
- Standford NLP Course(3) - Language Modeling
- NLP python widely used toolkits
- 语义分析若干知识
- Brown Clustering算法和代码学习
- NLP coursera note 1
- PYTHON 自然语言处理
- 自然语言处理
- 从文本抽取特征
- Installing Moses
- NLP学习笔记
- Windows7 平台下Python+NLTK环境搭建