决策树ID3的Python实现
2017-07-05 16:11
423 查看
ID3算法:
ID3算法通过计算每个属性的信息增益,认为信息增益高的是好属性,每次划分选取信息增益最高的属性为划分标准,重复这个过程,直至生成一个能完美分类训练样例的决策树。
决策树是对数据进行分类,以此达到预测的目的。该决策树方法先根据训练集数据形成决策树,如果该树不能对所有对象给出正确的分类,那么选择一些例外加入到训练集数据中,重复该过程一直到形成正确的决策集。决策树代表着决策集的树形结构。
ID3.py
from math import log # 计算信息熵 def calcShannonEnt(dataset): numEntries = len(dataset) # 计算出每种类别的数量 labelCounts = {} for featVec in dataset: currentLabel = featVec[-1] labelCounts[currentLabel] = labelCounts.get(currentLabel, 0) + 1 # 计算熵 shannonEnt = 0.0 for key in labelCounts: prob = float(labelCounts[key]) / numEntries if prob != 0: shannonEnt -= prob * log(prob, 2) return shannonEnt # 根据选择的分裂属性 进行数据集的分裂 def splitDataSet(dataset, feat, values): retDataSet = [] for featVec in dataset: if featVec[feat] == values: reducedFeatVec = featVec[:feat] reducedFeatVec.extend(featVec[feat+1:]) retDataSet.append(reducedFeatVec) return retDataSet # 寻找用于分裂最佳的属性 def findBestSplit(dataset): numFeatures = len(dataset[0]) - 1 baseEntropy = calcShannonEnt(dataset) bestInfoGain = 0.0 bestFeat = -1 # 计算出每种分裂的信息增益 找出最大的返回 for i in range(numFeatures): featValues = [example[i] for example in dataset] uniqueFeatValues = set(featValues) newEntropy = 0.0 for val in uniqueFeatValues: subDataSet = splitDataSet(dataset, i, val) prob = float(len(subDataSet)) / len(dataset) newEntropy += prob * calcShannonEnt(subDataSet) if(baseEntropy - newEntropy) > bestInfoGain: bestInfoGain = baseEntropy - newEntropy bestFeat = i return bestFeat # 找出最多的类别返回 def classify(classList): classCount = {} for vote in classList: if vote not in classCount.keys(): classCount[vote] = 0 classCount[vote] += 1 maxCount = 0 for key, value in classCount.items(): if value > maxCount: maxCount = value maxIndex = key return maxIndex # 使用ID3构造决策树 def treeGrowth(dataSet, features): classList = [example[-1] for example in dataSet] # 当类别都一样时 if classList.count(classList[0]) == len(classList): return classList[0] # 当没有属性可以分裂时 if len(dataSet[0]) == 1: return classify(classList) # 递归建树 bestFeat = findBestSplit(dataSet) bestFeatLabel = features[bestFeat] myTree = {bestFeatLabel: {}} features.pop(bestFeat) featValues = [example[bestFeat] for example in dataSet] uniqueFeatValues = 4000 set(featValues) for values in uniqueFeatValues: subDataSet = splitDataSet(dataSet, bestFeat, values) myTree[bestFeatLabel][values] = treeGrowth(subDataSet, features) return myTree # 使用构造出的决策树分类 def predict(tree, newObject): while isinstance(tree, dict): key = list(tree.keys())[0] tree = tree[key][newObject[key]] return tree def main(): # 创建数据集 def createDataSet(): dataSet = [[1, 1, 'yes'], [1, 1, 'yes'], [1, 0, 'no'], [0, 1, 'no'], [0, 1, 'no']] features = ['no surfacing', 'flippers'] return dataSet, features dataset, features = createDataSet() tree = treeGrowth(dataset, features) print(tree) print(predict(tree, {'no surfacing': 1, 'flippers': 1})) print(predict(tree, {'no surfacing': 1, 'flippers': 0})) print(predict(tree, {'no surfacing': 0, 'flippers': 1})) print(predict(tree, {'no surfacing': 0, 'flippers': 0})) if __name__ == '__main__': exit(main())
输出结果:
{'flippers': {0: 'no', 1: {'no surfacing': {0: 'no', 1: 'yes'}}}} yes no no no
相关文章推荐
- 机器学习(周志华)习题解答4.3: Python小白详解ID3决策树的实现
- [置顶] 《统计学习方法》 决策树 ID3和C4.5 生成算法 Python实现
- 决策树ID3和C4.5算法Python实现源码
- 分类算法-----决策树(ID3)算法原理和Python实现
- 决策树ID3和C4.5算法Python实现源码
- ID3决策树的算法原理与python实现
- 决策树ID3 算法python实现
- 机器学习算法的Python实现 (2):ID3决策树
- 决策树ID3和C4.5算法Python实现源码
- 基于Python实现的ID3决策树功能示例
- 机器学习算法--决策树ID3--python实现
- Python实现决策树(ID3、C4.5)
- ID3决策树的Python代码实现
- 机器学习之决策树(ID3)算法与Python实现
- 决策树(ID3,C4.5)Python实现
- Machine Learning In Action -- ID3决策树学习算法的python实现
- 决策树ID3;C4.5详解和python实现与R语言实现比较
- c++版id3决策树实现
- 【机器学习算法-python实现】决策树-Decision tree(2) 决策树的实现
- 【机器学习算法-python实现】Adaboost的实现(1)-单层决策树(decision stump)