您的位置:首页 > 编程语言 > Python开发

python 逻辑回归 程序解析

2015-08-17 15:22 645 查看
python《机器学习实战》逻辑回归部分,用全部样本多次进行梯度上升的程序如下:

# coding=utf-8
__author__ = 'Administrator'
from numpy import *
#从文本中加载数据,文档中保存了100个坐标为X,Y的数据
def loadDataSet():
dataMat = []; labelMat = []
fr = open('testSet.txt')
for line in fr.readlines():
lineArr = line.strip().split()
dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])   #将数据维度进行了拓展,第一维全部设置为1.0,第二维和第三维是原文本文档中的数据
labelMat.append(int(lineArr[2]))  #标签
return dataMat,labelMat

# sigmoid 函数运算
def sigmoid(inX):
return 1.0/(1+exp(-inX))

#梯度下降法
def gradAscent(dataMatIn, classLabels):
dataMatrix = mat(dataMatIn)             #convert to NumPy matrix
labelMat = mat(classLabels).transpose() #convert to NumPy matrix
m,n = shape(dataMatrix)   #get the rows and cols of the data
alpha = 0.001             #rate
maxCycles = 500           #biggest cycle times
weights = ones((n,1))
for k in range(maxCycles):              #heavy on matrix operations
h = sigmoid(dataMatrix*weights)     #matrix multiply
error = (labelMat - h)              #vector subtraction
weights = weights + alpha * dataMatrix.transpose()* error #matrix mult
return weights

#画出最佳的拟合直线
def plotBestFit(weights):
import matplotlib.pyplot as plt
dataMat,labelMat=loadDataSet()
dataArr = array(dataMat)
n = shape(dataArr)[0]    #get the rows of the data in fact is the samples
xcord1 = []; ycord1 = []
xcord2 = []; ycord2 = []
for i in range(n):       #two kinds data draw
if int(labelMat[i])== 1:
xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
else:
xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
ax.scatter(xcord2, ycord2, s=30, c='green')
# the line x range
x = arange(-3.0, 3.0, 0.1)
#画出直线,weights[0]*1.0+weights[1]*x+weights[2]*y=0
#之前计算时对原始数据做了拓展,将两维拓展为三维,第一维全部设置为1.0
y = (-weights[0]-weights[1]*x)/weights[2]
ax.plot(x, y)
plt.xlabel('X1'); plt.ylabel('X2');
plt.show()
直接运行如下代码可以画出结果:

import logRegres
dataArr,labelMat =logRegres.loadDataSet()
weights=logRegres.gradAscent(dataArr,labelMat)
logRegres.plotBestFit(weights.getA())
上述程序中有很多人不太明白weights.getA()这句是什么意思,调试时,如果直接print weights和print weights.getA()

会发现输出结果是一样的,但是如果将程序改为logRegres.plotBestfit(weights)会发现程序出错

原因就在于,python的科学计算库numpy中定义了一种ndarray,这种数组是一种描述性数组,比如:

x = np.matrix(np.arange(12).reshape((3,4)));
x
matrix([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
对这种描述性数组用getA()得到的结果是其本身,但是在程序执行过程中调用机制是不一样的

x.getA()
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
由于在定义weights时是采用weights=ones(n,3)

进而需要在后续调用时加上getA()函数,以免出错
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: