您的位置:首页 > 编程语言

6款最知名的开源数据挖掘工具【翻译】

2014-10-11 09:57 519 查看
Six of the BestOpen Source Data Mining Tools
6款最知名的开源数据挖掘工具

 原文

Itis rightfully said that data is money in today’s world.

在当今世界,数据就是财富。

Along with the transition to an app-based world comes theexponential growth of data. However, most of the data is unstructured and henceit takes a process and method to extract useful information from the data andtransform it
into understandable and usable form. This is where data miningcomes into picture. Plenty of tools are available for data mining tasks usingartificial intelligence, machine learning and other techniques to extract data.

基于应用的世界正在迅猛到来,数据正在呈指数级成长。然而,大多数的数据是非结构化的,因此,我们需要一种过程和方法,从这些数据中提取有用信息并转化为易于理解且可用的形式。数据挖掘因此到来。大量基于人工智能,机器学习及其他一些技术的工具可用于数据挖掘。

Here are six powerful open source data mining tools available:

以下是6个最具影响力的数据挖掘工具:

 

RapidMiner (formerly known asYALE)

Writtenin the Java Programming language, this tool offers advanced analytics throughtemplate-based frameworks. A bonus: Users hardly have to write any code.Offered as a service, rather than a piece of local software, this tool
holdstop position on the list of data mining tools.

 In addition to data mining,RapidMiner also provides functionality like data preprocessing andvisualization, predictive analytics and statistical modeling, evaluation, anddeployment. What makes it even more powerful
is that it provides learningschemes, models and algorithms from WEKA and R scripts.

RapidMineris distributed under the AGPL open source licence and can be downloaded fromSourceForge where it is rated the number one business analytics software. 

这个工具由Java语言编写,基于模板框架,提供了先进的分析方法。另外,这个工具不是一个本地的软件,而是一个服务,用户不用编写任何代码,它在数据挖掘工具中处于榜首的位置。

除了数据挖掘,RapidMiner还提供了数据预处理、可视化、预测分析和统计建模、评估和部署等功能。更加强大的是它提供了学习计划、模型和算法基于WEKA和R脚本。

RapidMiner遵从 AGPL开源许可,可以从SourceForge下载,它在业务分析软件中排名第一。

 

WEKA

The original non-Java version of WEKA primarily was developed foranalyzing data from the agricultural domain. With the Java-based version, thetool is very sophisticated and used in many different applications includingvisualization
and algorithms for data analysis and predictive modeling. Itsfree under the GNU General Public License, which is a big plus compared toRapidMiner, because users can customize it however they please.

WEKA supports several standard data mining tasks, includingdata preprocessing, clustering, classification, regression, visualization andfeature selection.

WEKAwould be more powerful with the addition of sequence modeling, which currentlyis not included.

最初的非java版本的WEKA主要是为农业领域开发用于分析数据。java版本后,功能变得更加丰富,用于许多不同的应用包括数据分析和预测建模的可视化和算法。它可以免费在GNU公共许可下使用,与RapidMiner相比这很重要,因为用户可以自己定制开发。

WEKA支持多种标准的数据挖掘任务,包括数据预处理、聚类、分类、回归、可视化和特征选择。

WEKA将退出更强大的序列建模,目前版本并不包括。

 

R-Programming

Whatif I tell you that Project R, a GNU project, is written in R itself? It’sprimarily written in C and Fortran. And a lot of its modules are written in Ritself. It’s a free software programming language and software environment
forstatistical computing and graphics. The R language is widely used among dataminers for developing statistical software and data analysis. Ease of use andextensibility has raised R’s popularity substantially in recent years.

Besides data mining it provides statistical and graphicaltechniques, including linear and nonlinear modeling, classical statisticaltests, time-series analysis, classification, clustering, and others.

如果我告诉你这个GNU项目,使用R本身编写?它主要是用C和Fortran语言编写的。它很多的模块都使用R本身编写。这是一个用于统计计算和图形的免费编程语言和环境。R语言在数据挖掘工程师中广泛使用,用于开发统计软件和数据分析。易用性和可扩展性使R语言近年来大受欢迎。

除了数据挖掘,它还提供统计和图形技术,包括线性和非线性建模、经典统计测试,时间序列分析、分类、聚类等。

 

Orange

Pythonis picking up in popularity because it’s simple and easy to learn yet powerful.Hence, when it comes to looking for a tool for your work and you are a Pythondeveloper, look no further than Orange, a Python-based, powerful
and opensource tool for both novices and experts.

You will fall in love with this tool’s visual programming andPython scripting. It also has components for machine learning, add-ons forbioinformatics and text mining. It’s packed with features for data analytics.

Python大受欢迎,因为它很简单,容易学习并且很强大。如果你是一个Python开发人员,并且正在寻找一款工具用于你的数据挖掘工作。不论你是新手还是专家,没有比Orange更合适的了。它是一款基于Python语言,强大的且开源的工具。

你会爱上这个工具的可视化编程和Python脚本。另外它还有机器学习组件,生物信息学和文本挖掘的扩展。总之,它拥有数据分析的几乎所有特性。

 

KNIME

Datapreprocessing has three main components:  extraction, transformation andloading. KNIME does all three. It gives you a graphical user interface to allowfor the assembly of nodes for data processing. It is an open source dataanalytics,
reporting and integration platform. KNIME also integrates variouscomponents for machine learning and data mining through its modular datapipelining concept and has caught the eye of business intelligence andfinancial data analysis.

 Written in Java and based onEclipse, KNIME is easy to extend and to add plugins. Additional functionalitiescan be added on the go. Plenty of data integration modules are already includedin the core version.

数据预处理有三个主要组件:提取、转换和加载。KNIME全部提供。它为您提供了一个图形用户界面,允许对数据处理进行节点的组装。它是一个开源的数据分析、报告和集成平台。KNIME还集成了各种组件用于机器学习,模块化数据流水线数据挖掘,商业智
9627
能和财务数据分析。

KNIME用java编写并且基于Eclipse,很容易扩展和添加插件。

用Java编写的基于Eclipse,KNIME很容易扩展和添加插件。大量的数据集成模块已经包括在核心版本。

 

NLTK

Whenit comes to language processing tasks, nothing can beat NLTK. NLTK provides apool of language processing tools including data mining, machine learning, datascraping, sentiment analysis and other various language processing
tasks. Allyou need to do is install NLTK, pull a package for your favorite task and youare ready to go. Because it’s written in Python, you can build applications ontop if it, customizing it for small tasks.

当谈到语言处理的任务,没有什么可以打败NLTK。NLTK提供了一揽子的语言处理工具,包括数据挖掘,机器学习,数据抓取、情绪分析和其他各种语言处理任务。所有您需要做的就是安装NLTK,为你喜欢的,准备干的任务拉个开发包。因为它是用Python编写的,您可以在它上面构建应用程序,为小任务做定制。

Chandan Goopta is a data researcher at Kathmandu University, focusing on building intelligent algorithms for sentiment analysis.

Chandan Goopta是加德满都大学的一名数据研究员,专注于为情绪分析构建智能算法。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息