您的位置:首页 > 编程语言 > Python开发

sklearn.pipeline.Pipeline类的用法

2016-04-17 19:11 441 查看
这一篇我会总结sklearn.pipeline.Pipeline。

1、sklearn.pipeline.Pipeline类

先给出官方的文档链接:http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

class sklearn.pipeline.Pipeline(steps)

官网的介绍如下:

pipeline of transforms with a final estimator.

最后估计量的变换管线

Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit.

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in
the example below.

解释:pipeline的目的就是当设置不同的参数时组合几个可以一起交叉验证的步骤。所以可以使用组合这几个步骤的名字和它们的属性参数(不过需要在参数前面加_来连接)。

参数:Parameters:

steps: list :
List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.

注释:参数steps是一个list,list里面是一个个(name,transform)格式的tuple。最后一个tuple是估计函数(就是我们训练的模型类型)。而前面的tuple就是交叉验证的步骤。

下面给出官网的一个例子:

#!/usr/env/bin python 

# -*- coding:utf-8 -*-

from sklearn import svm

from sklearn.datasets import samples_generator

from sklearn.feature_selection import SelectKBest

from sklearn.feature_selection import f_regression

from sklearn.pipeline import Pipeline

# generate some data to play with

#

X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42)

print X

print y

# ANOVA SVM-C

anova_filter = SelectKBest(f_regression, k=5)

print anova_filter

clf = svm.SVC(kernel='linear')#确定选择的模型

anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])

# You can set the parameters using the names issued

# For instance, fit using a k of 10 in the SelectKBest

# and a parameter 'C' of the svm

anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)#可以使用‘_’符号直接链接某个属性

print anova_svm.named_steps  #实际上是一个字典

print type(anova_svm)

prediction = anova_svm.predict(X)

score=anova_svm.score(X,y)

print prediction,type(prediction)

print score

           

输出结果如下:

X [[-2.70323229  0.67787532 -0.65407568 ...,  0.18958162  0.50109417

   2.41185611]

 [-0.30777823  0.21915033  0.24938368 ...,  0.64548418  0.74625357

   1.33408391]

 [-0.25737654 -1.66858407  0.39922312 ...,  0.61351797  0.12003133

  -0.22989455]

 ..., 

 [-0.01530985  0.5792915   0.11958037 ..., -1.47891157  0.39180401

   0.21434039]

 [-1.33123295 -1.83620537  0.50799133 ...,  0.95670232  0.70810868

  -2.14387014]

 [-1.31183623 -1.06511366 -0.3052247  ...,  0.55781031  1.39020755

  -1.58909265]]

Y [1 0 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1

 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 1

 0 0 1 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 0 1 0 1 1]

anova_filter: SelectKBest(k=5, score_func=<function f_regression at 0xaa05e9c>)

anova_svm.named_steps: {'svc': SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,

  kernel='linear', max_iter=-1, probability=False, random_state=None,

  shrinking=True, tol=0.001, verbose=False), 'anova': SelectKBest(k=10, score_func=<function f_regression at 0xaa05e9c>)}

type(anova_svm)= <class 'sklearn.pipeline.Pipeline'>

prediction= [0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1

 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 0 0

 1 0 1 1 0 0 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1] <type 'numpy.ndarray'>

score= 0.77

上面用到了几个方法:

set_params(**params)  设置步骤name的属性值

predict(*args, **kwargs) Applies transforms to the data, and the predict method of the final estimator.  预测估计值

score(*args, **kwargs) Applies transforms to the data, and the score method of the final estimator.   对最终的结果进行评分。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Python sklearn Pipeline