sklearn.pipeline.Pipeline类的用法
2016-04-17 19:11
441 查看
这一篇我会总结sklearn.pipeline.Pipeline。
1、sklearn.pipeline.Pipeline类
先给出官方的文档链接:http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
class sklearn.pipeline.Pipeline(steps)
官网的介绍如下:
pipeline of transforms with a final estimator.
最后估计量的变换管线
Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit.
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in
the example below.
解释:pipeline的目的就是当设置不同的参数时组合几个可以一起交叉验证的步骤。所以可以使用组合这几个步骤的名字和它们的属性参数(不过需要在参数前面加_来连接)。
参数:Parameters:
steps: list :
List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
注释:参数steps是一个list,list里面是一个个(name,transform)格式的tuple。最后一个tuple是估计函数(就是我们训练的模型类型)。而前面的tuple就是交叉验证的步骤。
下面给出官网的一个例子:
#!/usr/env/bin python
# -*- coding:utf-8 -*-
from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
# generate some data to play with
#
X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42)
print X
print y
# ANOVA SVM-C
anova_filter = SelectKBest(f_regression, k=5)
print anova_filter
clf = svm.SVC(kernel='linear')#确定选择的模型
anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
# You can set the parameters using the names issued
# For instance, fit using a k of 10 in the SelectKBest
# and a parameter 'C' of the svm
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)#可以使用‘_’符号直接链接某个属性
print anova_svm.named_steps #实际上是一个字典
print type(anova_svm)
prediction = anova_svm.predict(X)
score=anova_svm.score(X,y)
print prediction,type(prediction)
print score
输出结果如下:
X [[-2.70323229 0.67787532 -0.65407568 ..., 0.18958162 0.50109417
2.41185611]
[-0.30777823 0.21915033 0.24938368 ..., 0.64548418 0.74625357
1.33408391]
[-0.25737654 -1.66858407 0.39922312 ..., 0.61351797 0.12003133
-0.22989455]
...,
[-0.01530985 0.5792915 0.11958037 ..., -1.47891157 0.39180401
0.21434039]
[-1.33123295 -1.83620537 0.50799133 ..., 0.95670232 0.70810868
-2.14387014]
[-1.31183623 -1.06511366 -0.3052247 ..., 0.55781031 1.39020755
-1.58909265]]
Y [1 0 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1
0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 1
0 0 1 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 0 1 0 1 1]
anova_filter: SelectKBest(k=5, score_func=<function f_regression at 0xaa05e9c>)
anova_svm.named_steps: {'svc': SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='linear', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False), 'anova': SelectKBest(k=10, score_func=<function f_regression at 0xaa05e9c>)}
type(anova_svm)= <class 'sklearn.pipeline.Pipeline'>
prediction= [0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1
0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 0 0
1 0 1 1 0 0 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1] <type 'numpy.ndarray'>
score= 0.77
上面用到了几个方法:
set_params(**params) 设置步骤name的属性值
predict(*args, **kwargs) Applies transforms to the data, and the predict method of the final estimator. 预测估计值
score(*args, **kwargs) Applies transforms to the data, and the score method of the final estimator. 对最终的结果进行评分。
1、sklearn.pipeline.Pipeline类
先给出官方的文档链接:http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
class sklearn.pipeline.Pipeline(steps)
官网的介绍如下:
pipeline of transforms with a final estimator.
最后估计量的变换管线
Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit.
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in
the example below.
解释:pipeline的目的就是当设置不同的参数时组合几个可以一起交叉验证的步骤。所以可以使用组合这几个步骤的名字和它们的属性参数(不过需要在参数前面加_来连接)。
参数:Parameters:
steps: list :
List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
注释:参数steps是一个list,list里面是一个个(name,transform)格式的tuple。最后一个tuple是估计函数(就是我们训练的模型类型)。而前面的tuple就是交叉验证的步骤。
下面给出官网的一个例子:
#!/usr/env/bin python
# -*- coding:utf-8 -*-
from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
# generate some data to play with
#
X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42)
print X
print y
# ANOVA SVM-C
anova_filter = SelectKBest(f_regression, k=5)
print anova_filter
clf = svm.SVC(kernel='linear')#确定选择的模型
anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
# You can set the parameters using the names issued
# For instance, fit using a k of 10 in the SelectKBest
# and a parameter 'C' of the svm
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)#可以使用‘_’符号直接链接某个属性
print anova_svm.named_steps #实际上是一个字典
print type(anova_svm)
prediction = anova_svm.predict(X)
score=anova_svm.score(X,y)
print prediction,type(prediction)
print score
输出结果如下:
X [[-2.70323229 0.67787532 -0.65407568 ..., 0.18958162 0.50109417
2.41185611]
[-0.30777823 0.21915033 0.24938368 ..., 0.64548418 0.74625357
1.33408391]
[-0.25737654 -1.66858407 0.39922312 ..., 0.61351797 0.12003133
-0.22989455]
...,
[-0.01530985 0.5792915 0.11958037 ..., -1.47891157 0.39180401
0.21434039]
[-1.33123295 -1.83620537 0.50799133 ..., 0.95670232 0.70810868
-2.14387014]
[-1.31183623 -1.06511366 -0.3052247 ..., 0.55781031 1.39020755
-1.58909265]]
Y [1 0 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1
0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 1
0 0 1 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 0 1 0 1 1]
anova_filter: SelectKBest(k=5, score_func=<function f_regression at 0xaa05e9c>)
anova_svm.named_steps: {'svc': SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='linear', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False), 'anova': SelectKBest(k=10, score_func=<function f_regression at 0xaa05e9c>)}
type(anova_svm)= <class 'sklearn.pipeline.Pipeline'>
prediction= [0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1
0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 0 0
1 0 1 1 0 0 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1] <type 'numpy.ndarray'>
score= 0.77
上面用到了几个方法:
set_params(**params) 设置步骤name的属性值
predict(*args, **kwargs) Applies transforms to the data, and the predict method of the final estimator. 预测估计值
score(*args, **kwargs) Applies transforms to the data, and the score method of the final estimator. 对最终的结果进行评分。
相关文章推荐
- Python动态类型的学习---引用的理解
- Python3写爬虫(四)多线程实现数据爬取
- 垃圾邮件过滤器 python简单实现
- 下载并遍历 names.txt 文件,输出长度最长的回文人名。
- install and upgrade scrapy
- Scrapy的架构介绍
- Centos6 编译安装Python
- 使用Python生成Excel格式的图片
- 让Python文件也可以当bat文件运行
- [Python]推算数独
- Python中zip()函数用法举例
- Python中map()函数浅析
- Python将excel导入到mysql中
- Python在CAM软件Genesis2000中的应用
- 使用Shiboken为C++和Qt库创建Python绑定
- FREEBASIC 编译可被python调用的dll函数示例
- Python 七步捉虫法