您的位置：首页 > 其它

NLTK vs Sklearn vs Gensim

2016-04-05 16:43 489 查看

NLTK、SKlearn和Gensim使用场景

引用quora上的回答：

Yuval Feinstein的回答：

Generally,

- NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)

- Sklearn is used primarily for machine learning (classification, clustering, etc.)

- Gensim is used primarily for topic modeling and document similarity.

Roland Bischof的回答：

- NLTK is specialized on gathering and classifying unstructured texts. If you need e.g. a POS-tagger, lematizer, dependeny-analyzer, etc, you’ll find them there, and sometimes nowhere else. It offers a quit broad range of tools developped mainly in academic research. But: most often it is not very well optimized - involving NLTK libraries often means to accept a huge performance loss. If you do text-gathering or -preprocessing, its fine to begin with - until you found some faster alternatives.

-SKLEARN is a much more an analyzing tool, rather than an gathering tool. Its greatly documented, well optimized, and covers a broad range of statistical methods.

-GENSIM is a very well optimized, but also highly specialized, library for doing jobs in the periphery of “WORD2DOC”. That is: it offers an easy and surpringly well working and swift AI-approach to unstructured texts. If you are interested in prodution, you might also have a look on TensorFlow, which offers a mathematically generalized, yet highly performant, model.

Although considerably overlapping, I personnaly prefer using NLTK for pre-processing, GENSIM as kind of base platform, and SKLEARN for third step processing issues.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： nlp nltk sklearn gensim

相关文章推荐

新的分享

章节导航