您的位置:首页 > 其它

【机器学习实战】K-近邻算法

2018-02-04 17:52 441 查看
优点:精度高,对异常值不敏感,无数据输入假定

缺点:计算复杂度高,空间复杂度高

适用数据范围:数值型和标称型

伪代码:

对未知类别属性的数据集中的每个点一次执行以下操作:

1.计算一直类别数据集中的点与当前点之间的距离;

2.按照距离递增次序排序

3.选取与当前点距离最小的K个点

4.确定前K个点所在类别的出现频率

5.返回前K个点出现频率最高的类别作为当前点的预测分类

程序清单:

from numpy import *
import operator

def createDataSet():
group = array([[1.0,1.0],[1.0,1.0],[0,0],[0,0.1]])
labels = ['A','A','B','B']
return group,labels

def classify0(inX,dataSet,labels,k):
dataSetSize = dataSet.shape[0]

diffMat = tile(inX,(dataSetSize,1))-dataSet
sqDiffMat = diffMat **2
sqDistances = sqDiffMat.sum(axis = 1)
distances = sqDistances**0.5

sortedDistIndicies = distances.argsort()
classCount = {}

for i in range(k):
voteIlabel = labels[sortedDistIndicies[i]]
classCount[voteIlabel] = classCount.get(voteIlabel,0) +1
soortedClassCount = sorted(classCount.items(),key = operator.itemgetter(1),reverse=True)
return soortedClassCount[0][0]

#从文件中提取数据
def file2matrix(filename):
fr = open(filename)
arrayOlines = fr.readlines()
numberOfLines = len(arrayOlines)
returnMat = zeros((numberOfLines,3))#创建以0填充的矩阵NumPy
#Numpy矩阵
classLabelVector = []
index = 0
#(以下三行)解析文本数据到列表
for line in arrayOlines:
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]#选取前三个元素
classLabelVector.append(listFromLine[-1])
index +=1
return returnMat,classLabelVector
def autoNorm(dataSet):
minVals = dataSet.min(0) #参数零是的函数可以从列中选取最小值,而不是选取当前行的最小值
maxVals = dataSet.max(0)
ranges = maxVals - minVals #计算可能的取值范围
normDataset = zeros(shape(dataSet))
m = dataSet.shape[0]
normDataset = dataSet-tile(minVals,(m,1))
normDataset = normDataset/tile(maxVals,(m,1)) #特征值相除
return normDataset,ranges,minVals
def datingClassTest():
hoRatio = 0.10
datingDataMat,datingLabels = file2matrix('G:\kaggle\pratice\machinelearninginaction-master\Ch02\datingTestSet2.txt')
group,labels = createDataSet()
datingDataMat ,datingLabels = file2matrix("G:\kaggle\pratice\machinelearninginaction-master\Ch02\datingTestSet.txt")
normMat , ranges , minVals = autoNorm(datingDataMat)
print(normMat," ",ranges)注解:
>>e = array([[1.,0.,0.],[0.,1.,0.],[0.,0.,1.]])

>>e.shape()

(3,3)

>>e.shape[0]   #有多少行

3

numpy.tile([0,0],5)  #在列方向重复[0,0]五次

numpy.tile([0,0],(1,1))  #在列方向重复[0,0]一次,在行方向重复[0,0]一次
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: