线性回归与逻辑回归实战
2017-08-24 14:42
369 查看
逻辑回归实战:逻辑回归实现鸢尾花数据的分类
注意知识点:在类别标签y中是具体的字符串,记得用sklearn自带的处理标签,让其最后用0,1,2表示三个类
#!/usr/bin/python # -*- coding:utf-8 -*- import numpy as np from sklearn.linear_model import LogisticRegression from sklearn import preprocessing import pandas as pd from sklearn.model_selection import train_test_split path='/Users/apple/Desktop/10.Regression/10.iris.data'df=pd.read_csv(path,header=None) x=df.iloc[:,:-1] y1=df.iloc[:,-1] le=preprocessing.LabelEncoder() y=le.fit_transform(y1) print y x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=1)
model=LogisticRegression() model.fit(x_train,y_train) y_hat=model.predict(x_test) y_hat_prob=model.predict_proba(x_test) np.set_printoptions(suppress=True) auc=100*np.mean(y_hat == y_test) print y_hat print '======'print y_hat_prob print "========="print u'准确率: 4000 %f'%auc
线性回归实战:对销量的预测
注意事项:通过三个图片我们知道newspaper对销售的影响不是呈线性关系的,因而我们在模型中不使用此特征。
#!/usr/bin/python# -*- coding:utf-8 -*-import pandas as pdimport numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionimport matplotlib as mlpimport matplotlib.pyplot as pltpath ='/Users/apple/Desktop/10.Regression/10.Advertising.csv'data = pd.read_csv(path)x = data[['TV','Radio']]y = data['Sales']plt.figure()plt.subplot(311)plt.plot(data['TV'],y,'ro')plt.title('TV')plt.subplot(312)plt.plot(data['Radio'],y,'go')plt.title('Radio')plt.subplot(313)plt.plot(data['Newspaper'],y,'bo')plt.title('Newspapaer')plt.show()x_train,x_test,y_train,y_test= train_test_split(x,y,train_size=0.7,random_state=1)lr= LinearRegression()model = lr.fit(x_train,y_train)print modelprint lr.coef_print lr.intercept_y_hat = lr.predict(np.array(x_test))mse = np.average((y_hat-np.array(y_test))**2)rmse = np.sqrt(mse)print mse,rmset = np.arange(len(x_test))plt.plot(t, y_test, 'r-', linewidth=2, label='true data')plt.plot(t, y_hat, 'g-', linewidth=2, label='predict data')plt.legend(loc='upper right')plt.title('LinearRegression predict sale', fontsize=18)plt.grid()plt.show()lasso回归:销售量预测
注意:lasso回归会自动提取特征,但是没有Ridge回归的效果好,因而我们实践中常常用其两者的折中,即Elastic Net。
在代码中寻找最优参数是通过GridSerchCV实现的。
#!/usr/bin/python# -*- coding:utf-8 -*-import numpy as npimport matplotlib as mplimport matplotlib.pyplot as pltimport pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import Lasso, Ridgefrom sklearn.model_selection import GridSearchCVif __name__ == "__main__":# pandas读入data = pd.read_csv('10.Advertising.csv') # TV、Radio、Newspaper、Salesx = data[['TV', 'Radio', 'Newspaper']]# x = data[['TV', 'Radio']]y = data['Sales']print xprint yx_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1)model = Lasso()# model = Ridge()alpha_can = np.logspace(-3, 2, 10)lasso_model = GridSearchCV(model, param_grid={'alpha': alpha_can}, cv=6)lasso_model.fit(x_train, y_train)print '超参数:\n', lasso_model.best_params_y_hat = lasso_model.predict(np.array(x_test))print lasso_model.score(x_test, y_test)mse = np.average((y_hat - np.array(y_test)) ** 2) # Mean Squared Errorrmse = np.sqrt(mse) # Root Mean Squared Errorprint mse, rmset = np.arange(len(x_test))mpl.rcParams['font.sans-serif'] = [u'simHei']mpl.rcParams['axes.unicode_minus'] = Falseplt.plot(t, y_test, 'r-', linewidth=2, label=u'真实数据')plt.plot(t, y_hat, 'g-', linewidth=2, label=u'预测数据')plt.title(u'线性回归预测销量', fontsize=18)plt.legend(loc='upper right')plt.grid()plt.show()
相关文章推荐
- PyTorch:线性回归和逻辑回归实战
- Tensorflow实战学习(八)【机器学习基础 线性回归】
- PRML读书会第四章 Linear Models for Classification(贝叶斯marginalization、Fisher线性判别、感知机、概率生成和判别模型、逻辑回归)
- 线性回归, 逻辑回归和线性分类器
- 初学ML笔记N0.1——线性回归,分类与逻辑斯蒂回归,通用线性模型
- 【机器学习-西瓜书】三、逻辑回归(LR);线性判别分析(LDA)
- 对数线性模型之一(逻辑回归), 广义线性模型学习总结
- 对线性回归、逻辑回归、各种回归的概念学习
- 线性回归和逻辑回归的损失函数
- 线性回归 逻辑回归 树回归
- 对线性回归、逻辑回归、各种回归的概念学习
- 从广义线性模型到逻辑回归
- Python实战四.逻辑回归
- 对线性回归、逻辑回归、各种回归的概念学习
- 机器学习(二)广义线性模型:逻辑回归与Softmax分类
- 线性模型和逻辑回归
- 初学ML笔记N0.1——线性回归,分类与逻辑斯蒂回归,通用线性模型
- 对线性回归、逻辑回归、各种回归的概念学习
- scikit-learn机器学习(三)--逻辑回归和线性判别分析LDA
- 对线性回归、逻辑回归、各种回归的概念学习