您的位置:首页 > 其它

机器学习——决策树的实现

2017-07-25 14:18 134 查看
#!/usr/bin/env python#-*-coding:utf-8-*-#决策树的建立,训练测试,from sklearn.feature_extraction import DictVectorizerimport csvfrom sklearn import preprocessingfrom sklearn import treefrom sklearn.externals.six import StringIO#读数据allElectronicsData=open(r'jueceshu.csv','rb')reader=csv.reader(allElectronicsData)headers=reader.next()featureList=[]labelList=[]#分析数据for row in reader:#print(row)if(row):labelList.append(row[len(row)-1])rowDict={}for i in range(1,len(row)-1):rowDict[headers[i]]=row[i]featureList.append(rowDict)print(featureList)#转化数据vec=DictVectorizer()dummyX=vec.fit_transform(featureList).toarray()print('dummyX:'+str(dummyX))print(vec.get_feature_names())print('labelList:'+str(labelList))lb=preprocessing.LabelBinarizer()dummyY=lb.fit_transform(labelList)print('dummyY:'+str(dummyX))#训练数据clf=tree.DecisionTreeClassifier(criterion='entropy')clf=clf.fit(dummyX,dummyY)print('clf'+str(clf))#转化为dot模式with open('allElectronicInformationGainDri.dot','w') as f:f=tree.export_graphviz(clf,feature_names=vec.get_feature_names(),out_file=f)#决策树的预测oneRowX=dummyX[0,:]print('oneRowX:'+str(oneRowX))newRowX=oneRowXnewRowX[0]=1newRowX[2]=0print('newRowX:'+str(newRowX))predictedY=clf.predict(newRowX)print('predictedY:'+str(predictedY))
student=no <= 0.5entropy = 0.9403samples = 14value = [5, 9]Trueentropy = 0.0samples = 5value = [0, 5]Falseage=senior <= 0.5entropy = 0.5436samples = 8value = [1, 7] age=youth <= 0.5entropy = 0.9183samples = 6value = [4, 2]credit_rating=excellent <= 0.5entropy = 0.9183samples = 3value = [1, 2] credit_rating=excellent <= 0.5entropy = 0.9183samples = 3value = [1, 2]entropy = 0.0samples = 2value = [0, 2]entropy = 0.0samples = 1value = [1, 0]entropy = 0.0samples = 2value = [0, 2]entropy = 0.0samples = 3value = [3, 0]entropy = 0.0samples = 1value = [1, 0]1. Python2.  Python机器学习的库:scikit-learn      2.1: 特性:简单高效的数据挖掘和机器学习分析对所有用户开放,根据不同需求高度可重用性基于Numpy, SciPy和matplotlib开源,商用级别:获得 BSD许可     2.2 覆盖问题领域:          分类(classification), 回归(regression), 聚类(clustering), 降维(dimensionality reduction)          模型选择(model selection), 预处理(preprocessing)3. 使用用scikit-learn     安装scikit-learn: pip, easy_install, windows installer     安装必要package:numpy, SciPy和matplotlib, 可使用Anacond4000a (包含numpy, scipy等科学计算常用     package)     安装注意问题:Python解释器版本(2.7 or 3.4?), 32-bit or 64-bit系统4. 例子:文档: http://scikit-learn.org/stable/modules/tree.html       解释Python代码      安装 Graphviz: http://www.graphviz.org/      配置环境变量      转化dot文件至pdf可视化决策树:dot -Tpdf iris.dot -o outpu.pdf

数据集:

RiD,age,income,student,credit_rating,Class_buys_computer1,youth,high,no,fair,no2,youth,high,no,excellent,no3,middle_aged,high,no,fair,yes4,senior,medium,no,fair,yes5,senior,low,yes,fair,yes6,senior,low,yes,excellent,no7,middle_aged,low,yes,excellent,yes8,youth,medium,no,fair,no9,youth,low,yes,fair,yes10,senior,medium,yes,fair,yes11,youth,medium,yes,excellent,yes12,middle_aged,medium,yes,excellent,yes13,middle_aged,high,yes,fair,yes14,senior,medium,no,excellent,no结果:结果解释: 字典代表每一行,每个测试样例,每个测试集dummyX代表格式的转换        labelList代表结果集dummyY代表结果集的格式化clfDecisionTreeClassifier代表决策树分类器oneRowX代表其中一个测试集        newRowX代表一个新的测试集        predictedY代表预测的结果               
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: