您的位置:首页 > 编程语言 > Python开发

《机器学习实战》第13章-利用PCA来简化数据,Python3运行报错及解决方法

2018-11-24 10:50 337 查看

原书中的代码是在Python2上运行的,后来的新学者大多应该在用Python3了吧。笔者也是刚接触这个领域的新人,在跟着教材学习这一章的时候遇到了问题,自己写的测试程序test.py在调用pca.py程序时报错:

Traceback (most recent call last):
  File "F:\Python资料\机器学习实战 中文版+源码\机器学习实战——源码 - 备份\Ch13\test.py", line 8, in <module>
    lowDMat,reconMat=pca.pca(dataMat,1)
  File "F:\Python资料\机器学习实战 中文版+源码\机器学习实战——源码 - 备份\Ch13\pca.py", line 15, in pca
    meanVals = mean(dataMat, axis=0)
  File "D:\Python\lib\site-packages\numpy\core\fromnumeric.py", line 2954, in mean
    return mean(axis=axis, dtype=dtype, out=out, **kwargs)
  File "D:\Python\lib\site-packages\numpy\matrixlib\defmatrix.py", line 536, in mean
    return N.ndarray.mean(self, axis, dtype, out, keepdims=True)._collapse(axis)
  File "D:\Python\lib\site-packages\numpy\core\_methods.py", line 73, in _mean
    ret, rcount, out=ret, casting='unsafe', subok=False)
TypeError: unsupported operand type(s) for /: 'map' and 'int'

解决办法:

为了方便表述,先贴上代码。

test.py:

[code]import pca
from numpy import *

dataMat=pca.loadDataSet('testSet.txt')
lowDMat,reconMat=pca.pca(dataMat,1)
print(shape(lowDMat))

源代码pca.py

[code]'''
Created on Jun 1, 2011

@author: Peter Harrington
'''
from numpy import *

def loadDataSet(fileName, delim='\t'):
fr = open(fileName)
stringArr = [line.strip().split(delim) for line in fr.readlines()]
datArr = [map(float,line) for line in stringArr]
return mat(datArr)

def pca(dataMat, topNfeat=9999999):
meanVals = mean(dataMat, axis=0)
meanRemoved = dataMat - meanVals #remove mean
covMat = cov(meanRemoved, rowvar=0)
eigVals,eigVects = linalg.eig(mat(covMat))
eigValInd = argsort(eigVals)            #sort, sort goes smallest to largest
eigValInd = eigValInd[:-(topNfeat+1):-1]  #cut off unwanted dimensions
redEigVects = eigVects[:,eigValInd]       #reorganize eig vects largest to smallest
lowDDataMat = meanRemoved * redEigVects#transform data into new dimensions
reconMat = (lowDDataMat * redEigVects.T) + meanVals
return lowDDataMat, reconMat

def replaceNanWithMean():
datMat = loadDataSet('secom.data', ' ')
numFeat = shape(datMat)[1]
for i in range(numFeat):
meanVal = mean(datMat[nonzero(~isnan(datMat[:,i].A))[0],i]) #values that are not NaN (a number)
datMat[nonzero(isnan(datMat[:,i].A))[0],i] = meanVal  #set NaN values to mean
return datMat

虽然提示pca.py的第15行代码可能有错,但是实际问题却不在这一行,实际报错的原因是:

在pca.py的第11行“datArr = [map(float,line) for line in stringArr]”中的map()函数的用法变了,在Python2中map()函数返回列表,在Python3中map()函数返回迭代器,所以需要在map()函数前加上list()将结果转为列表,即只需要将pca.py的第11行代码修改为:

datArr = [list(map(float,line)) for line in stringArr]

就可以了。

分享此文,只愿后续的学者不要再踩同样的坑了,多花点时间在算法上,共勉!

阅读更多
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: