您的位置:首页 > 其它

机器学习常用工具

2014-03-04 16:54 288 查看




Support Vector Machine

SVMlight

An implementation of Vapnik's Support Vector Machine

LIBSVM

A Library for Support Vector Machines


Decision Tree

C4.5

The "classic" decision-tree tool, developed by J. R. Quinlan Tutorial


Maximum Entropy

YASMET

Yet Another Small MaxEnt Toolkit


Conditional Random Field

CRF++

A simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data

自然语言处理


综合

OpenNLP

An organizational center for open source projects related to natural language processing

CMU Statistical Language
Modeling Toolkit

A suite of UNIX software tools to facilitate the construction and testing of statistical language models

The Dragon ToolKit

A Java-based development package for academic use in information retrieval (IR) and text mining. Include many NLP tools

LingPipe

A suite of Java libraries for the linguistic analysis of human language, including

track mentions of entities (e.g. people or proteins);
link entity mentions to database entries;
uncover relations between entities and actions;
classify text passages by language, character encoding, genre, topic, or sentiment;
correct spelling with respect to a text collection;
cluster documents by implicit topic and discover significant trends over time; and
provide part-of-speech tagging and phrase chunking.

Natural Language Toolkit

Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.

Antelope

Advanced Natural Lange Object-oriented Processing Environment.包括一系列工具(特别c#的stanford parser)


分词

ICTCLAS

中科院的中文分词系统

Stanford Chinese Word
Segmenter

A Java implementation of a CRF-based Chinese Word Segmenter


词性标注

Brill tagger

A error-driven transformation-based tagger implemented by Eric Brill

Stanford POS Tagger

A Java implementation of the log-linear part-of-speech taggers descriped by Kristina Toutanova, et.al.

MBT:Memory-based Tagger
TreeTagger

A decision tree based tagger from the University of Stuttgart.

SVMTool , a POS Tagger based on SVMs
QTAG
Part of speech tagger

An HMM-based Java POS tagger from Birmingham U.


命名实体识别

Stanford Named Entity Recognizer

A Java implementation of a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition

LingPipe

Tools include statistical named-entity recognition, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine. Java. GPL. By Bob Carpenter, Breck Baldwin and co.

YamCha

SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)


Stemming

Porter Stemming

A process for removing the commoner morphological and inflexional endings from words in English byMartin
Porter

Snowball

A small string processing language designed for creating stemming algorithms for use in Information Retrieval.


句法分析

Stanford Parser

Java implementations of probabilistic natural language parsers, both highly optimized PCFG and dependency parsers, and a lexicalized PCFG parser.

Berkeley Parser

文本挖掘


摘要

Rouge Rouge在Windows下的配置

其他


加密

OpenSSL

包括众多加密算法,RSA、DES、MD5、SHA等 Win32安装版


压缩

zlib

A Massively Spiffy Yet Delicately Unobtrusive Compression Library


日志

Apache Logging Services

Creates and maintains open-source software related to the logging of application behavior and released at no charge to the public, including

log4j for Java,
log4cxx for C++, and
log4net for MS .Net framework.

注: log4cxx官方版本有内存泄漏问题


Unicode

ICU

A mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications


XML

Xerces

A validating XML parser, including C and Java edition


多字符串匹配

AC in C# : Aho-Corasick string
matching in C#


HTML Parser

Html Agility Pack , an agile HTML parser
that builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
Majestic-12 ,
an open source high-performance .NET C# module that was created to parse HTML for links, indexing and other purposes. 速度快,但不生成dom树

外部联接

An annotated list of resources by
Stanford NLP Group
KDnuggets 有一些与KDD相关的软件等
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: