您的位置:首页 > 编程语言 > Python开发

python机器学习及实战-Python基础综合实践

2017-09-09 21:47 387 查看
#读取数据
import pandas as pd
df_train = pd.read_csv('breast-cancer-train.csv')
df_test = pd.read_csv('breast-cancer-test.csv')

#print(df_train.info())
#print(df_test.info())

df_test_negative = df_test.loc[df_test['Type'] == 0][['Clump Thickness', 'Cell Size']]#将Type这一列值等于0的行的Clump Thickness,Cell Size列取出来,有点拗口
df_test_positive = df_test.loc[df_test['Type'] == 1][['Clump Thickness', 'Cell Size']]#将Type这一列值等于1的行的Clump Thickness,Cell Size列取出来

#print(df_test_negative)
#print(df_test_positive)

#绘制散点图1
import matplotlib.pyplot as plt
plt.scatter(df_test_negative['Clump Thickness'], df_test_negative['Cell Size'], marker = 'o', s = 200, c = 'red')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

#绘制二维直线图2
import numpy as np
intercept = np.random.random([1])
coef = np.random.random([2])
lx=np.arange(0, 12)

ly = (-intercept - lx * coef[0]) / coef[1]
plt.plot(lx, ly, c='yellow')
plt.show()

#测试样本正样本和负样本散点图图3
plt.scatter(df_test_negative['Clump Thickness'], df_test_negative['Cell Size'], marker = 'o', s = 200, c = 'red')
plt.scatter(df_test_positive['Clump Thickness'], df_test_positive['Cell Size'], marker = 'x', s = 150, c = 'black')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

#训练样本前十行训练的线性分类器图4
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(df_train[['Clump Thickness', 'Cell Size']][:10], df_train['Type'][:10])#采用训练样本的前十行进行训练
#print(df_train[['Clump Thickness', 'Cell Size']][:10])
#print(df_train['Type'][:10])
print('Testing accuracy (10 training samples):', lr.score(df_test[['Clump Thickness', 'Cell Size']], df_test['Type']))

intercept = lr.intercept_
coef = lr.coef_[0, :]
ly = (-intercept - lx * coef[0]) / coef[1]

plt.plot(lx, ly, c = 'green')
plt.scatter(df_test_negative['Clump Thickness'], df_test_negative['Cell Size'], marker = 'o', s = 200, c = 'red')
plt.scatter(df_test_positive['Clump Thickness'], df_test_positive['Cell Size'], marker = 'x', s = 200, c = 'black')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

#所有样本训练的线性分类器图5
lr = LogisticRegression()
lr.fit(df_train[['Clump Thickness', 'Cell Size']], df_train['Type'])#采用所有训练样本进行训练
print('Testing accuracy (all traning samples):', lr.score(df_test[['Clump Thickness', 'Cell Size']], df_test['Type']))
intercept = lr.intercept_
coef = lr.coef_[0, :]
ly = (-intercept - lx * coef[0]) / coef[1]

plt.plot(lx, ly, c = 'blue')
plt.scatter(df_test_negative['Clump Thickness'], df_test_negative['Cell Size'], marker = 'o', s = 200, c = 'red')
plt.scatter(df_test_positive['Clump Thickness'], df_test_positive['Cell Size'], marker = 'x', s= 200, c = 'black')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

运行结果:

Testing accuracy (10 training samples): 0.868571428571
Testing accuracy (all traning samples): 0.937142857143效果图:











所用到的训练数据和测试数据链接链接:http://pan.baidu.com/s/1c30cDS 密码:h8a0
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: