KNN算法实战——手写数字识别
2016-09-26 19:36
639 查看
KNN算法简介
KNN算法的简介可参考:K-近邻算法(KNN)手写数字识别
kNN算法主要被应用于文本分类、相似推荐,本文将描述一个分类的例子。何为手写识别?可参考维基百科介绍:手写识别
数据下载:手写识别数据
数据说明:每个手写数字已经事先处理成32*32的二进制文本,存储格式为txt文件。分为训练样本和测试样本:“trainingDigits”、“testDigits”。
编程实现步骤:
将每个图片(即txt文本,以下提到图片都指txt文本)转化为一个向量,即32*32的数组转化为1*1024的数组,这个1*1024的数组用机器学习的术语来说就是特征向量;
训练样本中有10*10个图片,可以合并成一个100*1024的矩阵,每一行对应一个图片;
测试样本中有10*5个图片,我们要让程序自动判断每个图片所表示的数字。同样的,对于测试图片,将其转化为1*1024的向量,然后计算它与训练样本中各个图片的“距离”(这里两个向量的距离采用欧式距离),然后对距离排序,选出较小的前k个,因为这k个样本来自训练集,是已知其代表的数字的,所以被测试图片所代表的数字就可以确定为这k个中出现次数最多的那个数字。
代码如下:
from numpy import * import operator from os import listdir def classify0(inX, dataSet, labels, k): dataSetSize = dataSet.shape[0] diffMat = tile(inX, (dataSetSize,1)) - dataSet sqDiffMat = diffMat**2 sqDistances = sqDiffMat.sum(axis=1) distances = sqDistances**0.5 sortedDistIndicies = distances.argsort() classCount={} for i in range(k): voteIlabel = labels[sortedDistIndicies[i]] classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1 sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True) return sortedClassCount[0][0] def img2vector(filename): returnVect = zeros((1,1024)) fr = open(filename) for i in range(32): lineStr = fr.readline() for j in range(32): returnVect[0,32*i+j] = int(lineStr[j]) return returnVect def handwritingClassTest(): hwLabels = [] trainingFileList = listdir('trainingDigits') m = len(trainingFileList) trainingMat = zeros((m,1024)) for i in range(m): fileNameStr = trainingFileList[i] fileStr = fileNameStr.split('.')[0] classNumStr = int(fileStr.split('_')[0]) hwLabels.append(classNumStr) trainingMat[i,:] = img2vector('trainingDigits/%s' % fileNameStr) testFileList = listdir('testDigits') errorCount = 0.0 mTest = len(testFileList) for i in range(mTest): fileNameStr = testFileList[i] fileStr = fileNameStr.split('.')[0] classNumStr = int(fileStr.split('_')[0]) vectorUnderTest = img2vector('testDigits/%s' % fileNameStr) classifierResult = classify0(vectorUnderTest, trainingMat, hwLabels, 3) print "the classifier came back with: %d, the real answer is: %d" % (classifierResult, classNumStr) if (classifierResult != classNumStr): errorCount += 1.0 print "\nthe total number of errors is: %d" % errorCount print "\nthe total error rate is: %f" % (errorCount/float(mTest)) handwritingClassTest()
输出结果:
the classifier came back with: 4, the real answer is: 4 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 8, the real answer is: 8 the classifier came back with: 8, the real answer is: 8 the classifier came back with: 5, the real answer is: 5 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 6, the real answer is: 6 the classifier came back with: 9, the real answer is: 9 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 7, the real answer is: 7 the classifier came back with: 0, the real answer is: 0 the classifier came back with: 9, the real answer is: 9 the classifier came back with: 7, the real answer is: 7 the classifier came back with: 8, the real answer is: 8 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 7, the real answer is: 7 the classifier came back with: 4, the real answer is: 4 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 0, the real answer is: 0 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 8, the real answer is: 8 the classifier came back with: 9, the real answer is: 9 the classifier came back with: 0, the real answer is: 0 the classifier came back with: 6, the real answer is: 6 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 9, the real answer is: 9 the classifier came back with: 6, the real answer is: 6 the classifier came back with: 8, the real answer is: 8 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 4, the real answer is: 4 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 4, the real answer is: 4 the classifier came back with: 0, the real answer is: 0 the classifier came back with: 4, the real answer is: 4 the classifier came back with: 5, the real answer is: 5 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 7, the real answer is: 7 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 6, the real answer is: 6 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 0, the real answer is: 0 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 7, the real answer is: 7 the classifier came back with: 9, the real answer is: 9 the classifier came back with: 5, the real answer is: 5 the classifier came back with: 5, the real answer is: 5 the classifier came back with: 6, the real answer is: 6 the classifier came back with: 5, the real answer is: 5 the total number of errors is: 0 the total error rate is: 0.000000
因为用的训练集和测试集都比较小,所以凑巧没有识别错误的情况。
相关文章推荐
- 机器学习实战之KNN算法识别手写数字_代码注释
- KNN算法实例---手写数字识别
- Python Opencv实战之数字识别之knn算法入门
- Python实现knn算法手写数字识别
- 编程实践--KNN分类算法--手写数字识别任务
- 机器学习(10.2)--手写数字识别的不同算法比较(2)--KNN算法
- 使用Knn算法实现手写数字识别系统(附带jpg转txt代码)
- 机器学习实战之程序清单1-kNN(手写数字识别系统)
- 机器学习-KNN算法应用-手写数字识别( hand-written digits)
- 使用kNN算法识别手写数字
- 机器学习实战(①)——KNN算法改进约会网站的配对效果和手写字识别系统
- knn-2 利用knn算法实现手写数字识别
- 【好玩的计算机视觉】KNN算法手写数字识别
- 机器学习(3)——KNN算法及手写数字的识别(一)
- 学习笔记——《机器学习实战》KNN算法实现 约会网站测试,手写数字识别,代码,注释,错误修改
- 【机器学习】Knn算法实现手写数字识别
- KNN分类算法实现手写数字识别
- 机器学习实战——使用K-近邻算法识别手写数字
- 【python】机器学习实战KNN算法之手写数字识别
- KNN近邻算法(python3)识别手写数字