机器学习——决策树的实现
2017-07-25 14:18
134 查看
#!/usr/bin/env python#-*-coding:utf-8-*-#决策树的建立,训练测试,from sklearn.feature_extraction import DictVectorizerimport csvfrom sklearn import preprocessingfrom sklearn import treefrom sklearn.externals.six import StringIO#读数据allElectronicsData=open(r'jueceshu.csv','rb')reader=csv.reader(allElectronicsData)headers=reader.next()featureList=[]labelList=[]#分析数据for row in reader:#print(row)if(row):labelList.append(row[len(row)-1])rowDict={}for i in range(1,len(row)-1):rowDict[headers[i]]=row[i]featureList.append(rowDict)print(featureList)#转化数据vec=DictVectorizer()dummyX=vec.fit_transform(featureList).toarray()print('dummyX:'+str(dummyX))print(vec.get_feature_names())print('labelList:'+str(labelList))lb=preprocessing.LabelBinarizer()dummyY=lb.fit_transform(labelList)print('dummyY:'+str(dummyX))#训练数据clf=tree.DecisionTreeClassifier(criterion='entropy')clf=clf.fit(dummyX,dummyY)print('clf'+str(clf))#转化为dot模式with open('allElectronicInformationGainDri.dot','w') as f:f=tree.export_graphviz(clf,feature_names=vec.get_feature_names(),out_file=f)#决策树的预测oneRowX=dummyX[0,:]print('oneRowX:'+str(oneRowX))newRowX=oneRowXnewRowX[0]=1newRowX[2]=0print('newRowX:'+str(newRowX))predictedY=clf.predict(newRowX)print('predictedY:'+str(predictedY))student=no <= 0.5entropy = 0.9403samples = 14value = [5, 9]Trueentropy = 0.0samples = 5value = [0, 5]Falseage=senior <= 0.5entropy = 0.5436samples = 8value = [1, 7] age=youth <= 0.5entropy = 0.9183samples = 6value = [4, 2]credit_rating=excellent <= 0.5entropy = 0.9183samples = 3value = [1, 2] credit_rating=excellent <= 0.5entropy = 0.9183samples = 3value = [1, 2]entropy = 0.0samples = 2value = [0, 2]entropy = 0.0samples = 1value = [1, 0]entropy = 0.0samples = 2value = [0, 2]entropy = 0.0samples = 3value = [3, 0]entropy = 0.0samples = 1value = [1, 0]1. Python2. Python机器学习的库:scikit-learn 2.1: 特性:简单高效的数据挖掘和机器学习分析对所有用户开放,根据不同需求高度可重用性基于Numpy, SciPy和matplotlib开源,商用级别:获得 BSD许可 2.2 覆盖问题领域: 分类(classification), 回归(regression), 聚类(clustering), 降维(dimensionality reduction) 模型选择(model selection), 预处理(preprocessing)3. 使用用scikit-learn 安装scikit-learn: pip, easy_install, windows installer 安装必要package:numpy, SciPy和matplotlib, 可使用Anacond4000a (包含numpy, scipy等科学计算常用 package) 安装注意问题:Python解释器版本(2.7 or 3.4?), 32-bit or 64-bit系统4. 例子:文档: http://scikit-learn.org/stable/modules/tree.html 解释Python代码 安装 Graphviz: http://www.graphviz.org/ 配置环境变量 转化dot文件至pdf可视化决策树:dot -Tpdf iris.dot -o outpu.pdf
数据集:
RiD,age,income,student,credit_rating,Class_buys_computer1,youth,high,no,fair,no2,youth,high,no,excellent,no3,middle_aged,high,no,fair,yes4,senior,medium,no,fair,yes5,senior,low,yes,fair,yes6,senior,low,yes,excellent,no7,middle_aged,low,yes,excellent,yes8,youth,medium,no,fair,no9,youth,low,yes,fair,yes10,senior,medium,yes,fair,yes11,youth,medium,yes,excellent,yes12,middle_aged,medium,yes,excellent,yes13,middle_aged,high,yes,fair,yes14,senior,medium,no,excellent,no结果:结果解释: 字典代表每一行,每个测试样例,每个测试集dummyX代表格式的转换 labelList代表结果集dummyY代表结果集的格式化clfDecisionTreeClassifier代表决策树分类器oneRowX代表其中一个测试集 newRowX代表一个新的测试集 predictedY代表预测的结果相关文章推荐
- 机器学习入门算法及其java实现-ID3(决策树)算法
- 机器学习决策树的算法实现
- 机器学习决策树的Python实现详细流程及原理解读_1
- 机器学习(周志华)习题解答4.3: Python小白详解ID3决策树的实现
- 机器学习之自己实现决策树
- 机器学习与数据挖掘算法 1.编程实现ID3算法,针对下表数据,生成决策树。
- 机器学习(5)——决策树(下)算法实现
- 机器学习基础——实现基本的决策树
- 【机器学习笔记之二】决策树的python实现
- 【机器学习】决策树(上)——从原理到算法实现
- 机器学习 - 决策树实现
- 机器学习之-决策树-具体怎么实现及应用
- 机器学习经典算法详解及Python实现--CART分类决策树、回归树和模型树
- 【机器学习】决策树(上)——从原理到算法实现
- 《机器学习》第三章决策树学习 ID3算法 c++实现代码
- 机器学习实战第三章,决策树的实现
- 机器学习—— 决策树(ID3算法)的分析与实现
- 机器学习经典算法详解及Python实现--决策树(Decision Tree)
- python 实现周志华 机器学习书中的决策树 c3.0
- 机器学习之决策树(ID3)算法与Python实现