您的位置:首页 > 其它

K近邻(KNN):分类算法

2014-06-02 19:52 323 查看
K近邻(KNN):分类算法

* KNN是non-parametric分类器(不做分布形式的假设,直接从数据估计概率密度),是memory-based learning.

* KNN不适用于高维数据(curse of dimension)

* Machine Learning的Python库很多,比如mlpy更多packages),这里实现只是为了掌握方法

* MATLAB 中的调用,见《MATLAB分类器大全(svm,knn,随机森林等)》

* KNN算法复杂度高(可用KD树优化,C中可以用libkdtree或者ANN

* k越小越容易过拟合,但是k很大会降分类精度(设想极限情况:k=1和k=N(样本数))

本文不介绍理论了,注释见代码。

KNN.py

[python] view
plaincopy





from numpy import *

import operator

class KNN:

def createDataset(self):

group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])

labels = ['A','A','B','B']

return group,labels

def KnnClassify(self,testX,trainX,labels,K):

[N,M]=trainX.shape

#calculate the distance between testX and other training samples

difference = tile(testX,(N,1)) - trainX # tile for array and repeat for matrix in Python, == repmat in Matlab

difference = difference ** 2 # take pow(difference,2)

distance = difference.sum(1) # take the sum of difference from all dimensions

distance = distance ** 0.5

sortdiffidx = distance.argsort()

# find the k nearest neighbours

vote = {} #create the dictionary

for i in range(K):

ith_label = labels[sortdiffidx[i]];

vote[ith_label] = vote.get(ith_label,0)+1 #get(ith_label,0) : if dictionary 'vote' exist key 'ith_label', return vote[ith_label]; else return 0

sortedvote = sorted(vote.iteritems(),key = lambda x:x[1], reverse = True)

# 'key = lambda x: x[1]' can be substituted by operator.itemgetter(1)

return sortedvote[0][0]

k = KNN() #create KNN object

group,labels = k.createDataset()

cls = k.KnnClassify([0,0],group,labels,3)

print cls

-------------------

运行:

1. 在Python Shell 中可以运行KNN.py

>>>import os

>>>os.chdir("/Users/mba/Documents/Study/Machine_Learning/Python/KNN")

>>>execfile("KNN.py")

输出B

(B表示类别)

2. 或者terminal中直接运行

$ python KNN.py

3. 也可以不在KNN.py中写输出,而选择在Shell中获得结果,i.e.,

>>>import KNN

>>> KNN.k.KnnClassify([0,0],KNN.group,KNN.labels,3)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: