数据挖掘学习-准备篇-python基础
2016-11-07 09:36
856 查看
python科学计算
1.使用python内置数据集
from sklearn import datasets
iris = datasets.load_iris()
>>> print(iris.data) [[ 0. 0. 5. ..., 0. 0. 0.] [ 0. 0. 0. ..., 10. 0. 0.] [ 0. 0. 0. ..., 16. 9. 0.] ..., [ 0. 0. 1. ..., 6. 0. 0.] [ 0. 0. 2. ..., 12. 0. 0.] [ 0. 0. 10. ..., 12. 1. 0.]]
2.使用svm
>>> from sklearn import svm >>> clf = svm.SVC(gamma=0.001, C=100.)
3.拟合fit和predict
fit(X, y)和
predict(T).
X, y = iris.data, iris.target
4.获取数组的大小——shape属性
iris.shape得(28,19)
5.target
digits.target 就是数字数据集各样例对应的真实数字值。也就是我们的程序要学习的。6.pickle来保存scikit中的模型
>>>import
pickle
>>>s
= pickle.dumps(clf)
>>>clf2
= pickle.loads(s)
7.Estimators对象
一个 estimator 可以是任意一个从数据中学习到的对象;他可能是分类算法(classification),回归算法(regression), 聚类算法(clustering),或者一个变换算法不管他是何种算法,所有的 estimator 对象都向外部暴露了一个 fit 方法 ,该成员方法的操作对象是一个数据集
一个estimator的所有参数即可以在初始化的时候设置,也可以 按对应属性修改:
>>> estimator = Estimator(param1=1, param2=2) >>> estimator.param1
predict(X) 用于预测数据集
X中的未知标签的样本,并返回预测的标签
y.
每一个estimator暴露一个计算estimator在测试数据上的测试得分的方法: score 得分越大,estimator对数据的拟合模型越好。 .
8.KNN (k nearest neighbors) 分类器例子:
>>> # Split iris data in train and test data >>> # A random permutation, to split the data randomly >>> np.random.seed(0) >>> indices = np.random.permutation(len(iris_X)) >>> iris_X_train = iris_X[indices[:-10]] >>> iris_y_train = iris_y[indices[:-10]] >>> iris_X_test = iris_X[indices[-10:]] >>> iris_y_test = iris_y[indices[-10:]] >>> # Create and fit a nearest-neighbor classifier >>> from sklearn.neighbors import KNeighborsClassifier >>> knn = KNeighborsClassifier() >>> knn.fit(iris_X_train, iris_y_train) KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=1, n_neighbors=5, p=2, weights='uniform') >>> knn.predict(iris_X_test) array([1, 2, 1, 0, 0, 0, 2, 1, 2, 0]) >>> iris_y_test array([1, 1, 1, 0, 0, 0, 2, 1, 2, 0])
9.在python当中处理csv文件,可以使用标准库当中的csv模块。其中的writer和reader方法可以对csv文件进行读写。
import csv rf = open('bank.csv','rb') reader = csv.reader(rf)
此处要注意,打开一个csv文件,必须用二进制的形式打开。
此时的reader为一个迭代器,它只能使用next()和for循环。
reader.next() 返回即为第一行的内容。
要看得到所有内容,就可以使用for循环了。
for row in reader: print row
接下来,来看写入csv文件。
wf = open('bank2.csv','wb') writer = csv.writer(wf) writer.writerow(['id','age','sex','region','income','married','children','car','save_act','current_act','mortgage','pep']) writer.writerow(reader.next())
10.线性回归LR
线性回归的最简单形式是通过调节一个参数集合为数据集拟合一个线性模型,使得其残差平方和尽可能小。线性模型:
: 数据
: 目标变量
: 系数
: 观测噪声
>>> from sklearn import linear_model >>> regr = linear_model.LinearRegression() >>> regr.fit(diabetes_X_train, diabetes_y_train) LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False) >>> print(regr.coef_) [ 0.30349955 -237.63931533 510.53060544 327.73698041 -814.13170937 492.81458798 102.84845219 184.60648906 743.51961675 76.09517222] >>> # The mean square error >>> np.mean((regr.predict(diabetes_X_test)-diabetes_y_test)**2) 2004.56760268... >>> # Explained variance score: 1 is perfect prediction >>> # and 0 means that there is no linear relationship >>> # between X and Y. >>> regr.score(diabetes_X_test, diabetes_y_test) 0.5850753022690...
11.把数据集拆分成 folds 用于训练和测试
>>> import numpy as np >>> X_folds = np.array_split(X_digits, 3) >>> y_folds = np.array_split(y_digits, 3)
12.交叉验证生成器
交叉验证生成器 (cross-validation generators)去产生一个 索引列表来达到数据分割的目的:>>> from sklearn import cross_validation >>> k_fold = cross_validation.KFold(n=6, n_folds=3) >>> for train_indices, test_indices in k_fold: ... print('Train: %s | test: %s' % (train_indices, test_indices)) Train: [2 3 4 5] | test: [0 1] Train: [0 1 4 5] | test: [2 3] Train: [0 1 2 3] | test: [4 5]
基于交叉验证生成器,交叉验证的实现将会变得非常简单轻松:
>>> kfold = cross_validation.KFold(len(X_digits), n_folds=3) >>> [svc.fit(X_digits[train], y_digits[train]).score(X_digits[test], y_digits[test]) ... for train, test in kfold] [0.93489148580968284, 0.95659432387312182, 0.93989983305509184]
为了计算一个estimator的
score方法的值, sklearn 暴露了一个辅助性的方法:
>>> cross_validation.cross_val_score(svc, X_digits, y_digits, cv=kfold, n_jobs=-1) array([ 0.93489149, 0.95659432, 0.93989983])
13.Numpy库
(1)会求均值、方差、协方差
平均值——mean()求均值print('残差平均值: %.2f'
% np.mean((model.predict(X) - y) **
2))
方差——var()求方差
print(np.var([6,
8,
10,
14,
18],
ddof=1))
#numpy.var()可以直接计算方差
协方差——cov()求协方差
print(np.cov([6,
8, 10, 14, 18], [7,
9, 13, 17.5, 18])[0][1])
#numpy.cov()计算协方差
(2)矩阵计算-求逆inv,点乘dot,转置transpose
<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">from numpy.linalg import inv from numpy import dot, transpose X = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[1, 6, 2], [1, 8, 1], [1, 10, 0], [1, 14, 2], [1, 18, 0]]</span> y = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[7], [9], [13], [17.5], [18]]</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">print</span>(dot(inv(dot(transpose(X), X)), dot(transpose(X), y)))</code>
(3)使用lstsq()求最小二乘法
<code class="hljs perl has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">from numpy.linalg import lstsqprint(lsts<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">q(X, y)</span>[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>])</code>
14.Sklearn库
(1)会基于linear_model.LinearRegression建立一元/多元线性回归模型;会基于LinearRegression和preprocessing.PolynomialFeatures建立一元多次线性回归模型;会基于linear_model.SGDRegressor建立随机梯度下降SGD模型;(2)使用model.fit()建模,使用model.predict()预测,使用model.score()求测试集的R-Square;
(3)基于cross_validation,会用train_test_split()函数划分训练集和测试集,会用cross_val_score()计算交叉检验的R-Squre结果;
调入线性回归函数LinearRegression;
<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.linear_model <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> LinearRegression</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>
fit()建立一元线性回归模型
<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">X = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[8], [9], [11], [16], [12]]</span> y = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[11], [8.5], [15], [18], [11]]</span> model = LinearRegression() model.fit(X, y)#建立一元线性回归模型</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
fit()建立多元线性回归模型
<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">X = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]]</span> y = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[7], [9], [13], [17.5], [18]]</span> model = LinearRegression() model.fit(X, y) #建立二元线性回归模型</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
predict()通过fit()算出的模型参数构成的模型,对解释变量进行预测获得的值;
<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">print</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'预测一张12英寸匹萨价格:$%.2f'</span> % model.predict([<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>])[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]) #单值预测 X_test = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[8, 2], [9, 0], [11, 2], [16, 2], [12, 0]]</span> predictions = model.predict(X_test)#一组数进行预测</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
mode.score计算R方R-Square
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">model<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.score</span>(X_test, y_test) <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">#LinearRegression的score方法可以计算R方</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>
建立一元多项式回归模型
<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures X_train = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[6], [8], [10], [14], [18]]</span> y_train = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[7], [9], [13], [17.5], [18]]</span> #需要输入列向量,而不是行向量 X_test = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[6], [8], [11], [16]]</span>#测试数据准备 y_test = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[8], [12], [15], [18]]</span> quadratic_featurizer = PolynomialFeatures(degree=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>) #定义多项式的最高阶数 X_train_quadratic = quadratic_featurizer.fit_transform(X_train) #为fit()建模准备输入 regressor_quadratic = LinearRegression() regressor_quadratic.fit(X_train_quadratic, y_train)# fit()函数建模 X_test_quadratic = quadratic_featurizer.transform(X_test) #为预测准备输入量 y_test_quadratic=regressor_quadratic.predict(xx_quadratic) #使用模型预测数据 <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">print</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'一元线性回归 r-squared'</span>, regressor.score(X_test, y_test)) #计算R-Square</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li></ul>
train_test_split()按数据集分成训练集和测试集;分区比例可以设置,默认25%分给测试集;
<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.cross_validation <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> train_test_split df = pd.read_csv(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'mlslpic/winequality-red.csv'</span>, sep=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">';'</span>) X = df[list(df.columns)[:-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]] y = df[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'quality'</span>] X_train, X_test, y_train, y_test = train_test_split(X, y)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>
train_size改变训练集和测试集的比例
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">X_train,X_test,y_train,y_test=train_test_split(X,y,<strong>train_size=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.5</span></strong>)</code>
cross_val_score()可以返回交叉检验的score结果;
<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.cross_validation <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> cross_val_score regressor = LinearRegression() scores = cross_val_score(regressor, X, y, cv=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>) print(scores.mean(), scores</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
加载scikit-learn数据集
<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.datasets <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> load_boston</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>
加载SGDRegressor,归一化StandardScaler,建立模型以及求R-Square
<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.linear_model <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> SGDRegressor <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.preprocessing <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> StandardScaler X_scaler = StandardScaler() y_scaler = StandardScaler() X_train = X_scaler.fit_transform(X_train) y_train = y_scaler.fit_transform(y_train) X_test = X_scaler.transform(X_test) y_test = y_scaler.transform(y_test) regressor = SGDRegressor(loss=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'squared_loss'</span>) regressor.fit_transform(X_train, y_train) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#建立模型</span> print(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'测试集R方值:'</span>, regressor.score(X_test, y_test))</code>
15.list索引号
<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>a = [<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>] <span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>a[:-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#输出[1, 2, 3, 4, 5]</span> <span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>a[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#输出[2]</span> <span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>a[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#输出[2, 3, 4, 5, 6]</span> </code>
16.使用enumerate()
循环,得到数组的序号idx
(放在前面的)和数值val
(放在后面的);
<code class="hljs fsharp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> idx, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> enumerate(ints): print(idx, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span>)</code>
17.linspace()
将区间进行划分
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">xx = np<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.linspace</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">26</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100</span>) xx <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">#输出: array([ 0. , 6.5, 13. , 19.5, 26. ])</span></code>
将区间划分
arange和
linsapce,
前者是从起点按照给定步长进行划分,只有当终点也在步长整数倍时才会被包含在内;
后者是将起点和终点中间等距划分,终点位最后一位数;
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">X=np.arange(-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>);X <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 输出array([-6, -1, 4])</span> X=np.arange(-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>);X <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 输出array([-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5])</span> X=np.linspace(-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>);X <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 输出 array([-6., -3., 0., 3., 6.])</span></code>
18.使用LogisticRegression
分类器进行训练和分类
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.linear_model.logistic <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> LogisticRegression classifier = LogisticRegression() classifier.fit(X_train, y_train) predictions = classifier.predict(X_test)</code>
19.基本用法-Math
基本数学运算(加、减、乘、除、取余、绝对值、幂),不需要调用
math库
数学运算(三角函数、反三角函数、指数、对数),需要调用
math库
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> math <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.sin(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># sine</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.cos(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># cosine</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.tan(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># tangent </span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.asin(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># arc sine</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.acos(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># arc cosine</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.atan(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># arc tangent</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.sinh(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># hyperbolic sine </span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.cosh(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># hyperbolic cosine</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.tanh(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># hyperbolic tangent</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.pow(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 2 raised to 4</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.exp(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># e ^ 4</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.sqrt(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># square root</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.pow(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3.0</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># cubic root of 5</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.log(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># ln; natural logarithm</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.log(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># base 10</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.ceil(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.3</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># ceiling</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.floor(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># floor</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.pi <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.e</code>
20.numpy.linalg.eig()
计算特征值和特征向量
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> numpy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> np w,v=np.linalg.eig(np.array([[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>],[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>]]))</code>
sklearn用户手册http://download.csdn.net/detail/ssrob/8757217
相关文章推荐
- python数据分析与挖掘学习笔记(1)-基础及准备
- 数据挖掘学习-准备篇-机器学习基础
- 数据挖掘学习-准备篇-python编辑
- python数据持久存储:pickle模块的基本使用 分类: python python基础学习 python 小练习 2013-06-17 14:41 209人阅读 评论(0) 收藏
- 去除字符串重复数据 分类: python基础学习 2013-08-08 17:43 171人阅读 评论(0) 收藏
- 关于python,数据挖掘,自然语言处理的一些学习资源
- python 数据挖掘基础 入门
- 小甲鱼:Python学习笔记001_变量_分支_数据类型_运算符等基础
- Python学习入门基础教程(learning Python)--6 Python下的list数据类型
- python基础教程学习笔记 — 准备Linux下开发环境
- 王亟亟的Python学习之路(三)-基础语法以及基本数据类型
- python学习之基础数据类型
- Python学习之--数据基础(二)
- Python学习入门基础教程(learning Python)--1.3 Python数据输入 .
- 王亟亟的Python学习之路(三)-基础语法以及基本数据类型
- python基础学习-1(数据类型)
- 数据挖掘学习--数据挖掘基础概念
- 函数名function是一个数据类型,可以赋值 分类: python基础学习 2013-09-12 11:01 366人阅读 评论(0) 收藏
- Python学习之基础数据部分
- 零基础入门学习Python(3):数据类型