您的位置:首页 > 其它

NLTK vs Sklearn vs Gensim

2016-04-05 16:43 489 查看

NLTK、SKlearn和Gensim使用场景

引用quora上的回答:

Yuval Feinstein的回答:

Generally,

- NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)

- Sklearn is used primarily for machine learning (classification, clustering, etc.)

- Gensim is used primarily for topic modeling and document similarity.

Roland Bischof的回答:

- NLTK is specialized on gathering and classifying unstructured texts. If you need e.g. a POS-tagger, lematizer, dependeny-analyzer, etc, you’ll find them there, and sometimes nowhere else. It offers a quit broad range of tools developped mainly in academic research. But: most often it is not very well optimized - involving NLTK libraries often means to accept a huge performance loss. If you do text-gathering or -preprocessing, its fine to begin with - until you found some faster alternatives.

-SKLEARN is a much more an analyzing tool, rather than an gathering tool. Its greatly documented, well optimized, and covers a broad range of statistical methods.

-GENSIM is a very well optimized, but also highly specialized, library for doing jobs in the periphery of “WORD2DOC”. That is: it offers an easy and surpringly well working and swift AI-approach to unstructured texts. If you are interested in prodution, you might also have a look on TensorFlow, which offers a mathematically generalized, yet highly performant, model.

Although considerably overlapping, I personnaly prefer using NLTK for pre-processing, GENSIM as kind of base platform, and SKLEARN for third step processing issues.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  nlp nltk sklearn gensim