Python实现KNN算法
2015-08-28 20:22
661 查看
导入的包
读取数据:
计算欧氏距离:
选取最近的k个训练数据集:
在最近的k个中选取出现最多的那个类别标签:
计算精确度:
主函数:
参考链接
数据集
import csv import random import math import operator
读取数据:
def loadDataSet(filename,split,trainingSet=[] , testSet = []): with open(filename,'rb') as csvfile: lines = csv.reader(csvfile) dataSet = list(lines) for x in range(len(dataSet) -1): for y in range(4): dataSet[x][y] = float(dataSet[x][y]) if random.random()<split: trainingSet.append(dataSet[x]) else: testSet.append(dataSet[x])
计算欧氏距离:
def euclideanDistance(vec1,vec2,length): distance = 0 for x in range(length): distance +=pow(vec1[x] - vec2[x],2) return math.sqrt(distance)
选取最近的k个训练数据集:
def getNeighbors(trainingSet,testInstance,k): distances = [] length = len(testInstance) - 1 for x in range(len(trainingSet)): dist = euclideanDistance(testInstance,trainingSet[x],length) distances.append((trainingSet[x],dist)) distances.sort(key = operator.itemgetter(1)) neighbors = [] for x in range(k): neighbors.append(distances[x][0]) return neighbors
在最近的k个中选取出现最多的那个类别标签:
def getResponse(neighbors): classVotes = {} for x in range(len(neighbors)): response = neighbors[x][-1] if response in classVotes: classVotes[response] +=1 else: classVotes[response] = 1 sortedVotes = sorted(classVotes.iteritems(),key=operator.itemgetter(1)) return sortedVotes[0][0]
计算精确度:
def getAccuracy(testSet,predictions): correct = 0 for x in range(len(testSet)): if str(testSet[x][-1]) == str(predictions[x]): correct +=1 else: print 'real:',testSet[x][-1],'pre:',predictions[x] return (correct/float(len(testSet)))*100.0
主函数:
def main(): print "read data" trainingSet = [] testSet = [] split = 0.67 loadDataSet('iris.data',split,trainingSet,testSet) print 'train set:'+repr(len(trainingSet)) print 'test set:'+repr(len(testSet)) print 'predictions' k = 3 predictions = [] for x in range(len(testSet)): neighbors = getNeighbors(trainingSet,testSet[x],k) result = getResponse(neighbors) predictions.append(result) accuracy = getAccuracy(testSet,predictions) print('Accuracy: ' + repr(accuracy) + '%') main()
参考链接
数据集
相关文章推荐
- Python中的可变参数
- Python中常见的数据类型总结
- 2015/8/10 Python基本使用(1)
- [python]学习笔记5-函数参数类型
- 机器学习之Softmax回归(Python实现)
- Python学习----Python基础
- python winapi demo
- 使用Sublime Text搭建python调试环境
- python
- python中列表,元组,字符串如何互相转换
- Python之禅
- Python字符串的encode与decode研究心得乱码问题解决方法
- python 实现 RPC 通信
- python timestamp和datetime之间的转换
- Python学习相关资料
- Python 打log
- python安装
- 【help of python】ones函数
- Python daemonize
- Python代码登录新浪微博并自动发微博