机器学习常用工具
2014-03-04 16:54
288 查看
Support Vector Machine
SVMlightAn implementation of Vapnik's Support Vector Machine
LIBSVM
A Library for Support Vector Machines
Decision Tree
C4.5The "classic" decision-tree tool, developed by J. R. Quinlan Tutorial
Maximum Entropy
YASMETYet Another Small MaxEnt Toolkit
Conditional Random Field
CRF++A simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data
自然语言处理
综合
OpenNLPAn organizational center for open source projects related to natural language processing
CMU Statistical Language
Modeling Toolkit
A suite of UNIX software tools to facilitate the construction and testing of statistical language models
The Dragon ToolKit
A Java-based development package for academic use in information retrieval (IR) and text mining. Include many NLP tools
LingPipe
A suite of Java libraries for the linguistic analysis of human language, including
track mentions of entities (e.g. people or proteins);
link entity mentions to database entries;
uncover relations between entities and actions;
classify text passages by language, character encoding, genre, topic, or sentiment;
correct spelling with respect to a text collection;
cluster documents by implicit topic and discover significant trends over time; and
provide part-of-speech tagging and phrase chunking.
Natural Language Toolkit
Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.
Antelope
Advanced Natural Lange Object-oriented Processing Environment.包括一系列工具(特别c#的stanford parser)
分词
ICTCLAS中科院的中文分词系统
Stanford Chinese Word
Segmenter
A Java implementation of a CRF-based Chinese Word Segmenter
词性标注
Brill taggerA error-driven transformation-based tagger implemented by Eric Brill
Stanford POS Tagger
A Java implementation of the log-linear part-of-speech taggers descriped by Kristina Toutanova, et.al.
MBT:Memory-based Tagger
TreeTagger
A decision tree based tagger from the University of Stuttgart.
SVMTool , a POS Tagger based on SVMs
QTAG
Part of speech tagger
An HMM-based Java POS tagger from Birmingham U.
命名实体识别
Stanford Named Entity RecognizerA Java implementation of a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition
LingPipe
Tools include statistical named-entity recognition, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine. Java. GPL. By Bob Carpenter, Breck Baldwin and co.
YamCha
SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)
Stemming
Porter StemmingA process for removing the commoner morphological and inflexional endings from words in English byMartin
Porter
Snowball
A small string processing language designed for creating stemming algorithms for use in Information Retrieval.
句法分析
Stanford ParserJava implementations of probabilistic natural language parsers, both highly optimized PCFG and dependency parsers, and a lexicalized PCFG parser.
Berkeley Parser
文本挖掘
摘要
Rouge Rouge在Windows下的配置其他
加密
OpenSSL包括众多加密算法,RSA、DES、MD5、SHA等 Win32安装版
压缩
zlibA Massively Spiffy Yet Delicately Unobtrusive Compression Library
日志
Apache Logging ServicesCreates and maintains open-source software related to the logging of application behavior and released at no charge to the public, including
log4j for Java,
log4cxx for C++, and
log4net for MS .Net framework.
注: log4cxx官方版本有内存泄漏问题
Unicode
ICUA mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications
XML
XercesA validating XML parser, including C and Java edition
多字符串匹配
AC in C# : Aho-Corasick stringmatching in C#
HTML Parser
Html Agility Pack , an agile HTML parserthat builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
Majestic-12 ,
an open source high-performance .NET C# module that was created to parse HTML for links, indexing and other purposes. 速度快,但不生成dom树
外部联接
An annotated list of resources byStanford NLP Group
KDnuggets 有一些与KDD相关的软件等
相关文章推荐
- 机器学习常用工具
- 文本分类——机器学习常用工具
- 机器学习,遥感领域的常用网址工具总结
- !! 机器学习常用工具
- 机器学习常用工具<转>
- 关于 Java 常用工具您不知道的 5 件事
- WEB前端设计师常用工具集锦
- md5sum命令常用工具命令
- 如何为一些常用的工具开启启动快捷键。比如截图工具,qq,浏览器等。
- screen命令常用工具命令
- 计算机视觉、模式识别、机器学习常用牛人主页链接
- 常用机器视觉工具----图像分析工具(blob分析)
- Linux驱动开发常用调试工具---之内存读写工具devmem和devkmem
- android studio常用小工具设置(一)
- VC6.0开发工具常用快捷键大全
- CTF常用工具/速查资料
- 机器学习中常用的矩阵求导公式
- Linux系统性能监控常用工具
- 安全从业人员常用工具指引
- 日志统计中常用的工具命令