scikit-learn(工程中用的相对较多的模型介绍):1.4. Support Vector Machines
2015-08-04 07:33
288 查看
参考:http://scikit-learn.org/stable/modules/svm.html
在实际项目中,我们真的很少用到那些简单的模型,比如LR、kNN、NB等,虽然经典,但在工程中确实不实用。
今天我们关注在工程中用的相对较多的SVM。
SVM功能不少:Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers
detection.
好处多多:高维空间的高效率;维度大于样本数的有效性;仅使用训练点的子集(称作支持向量),空间占用少;有不同的kernel functions供选择。
也有坏处:维度大于样本数的有效性----但维度如果相对样本数过高,则效果会非常差;不能直接提供概率估计,需要通过an expensive five-fold cross-validation (see Scores
and probabilities, below).才能实现。
(SVM支持dense和sparse sample vectors,但是如果预测使用的sparse data,那训练也要使用稀疏数据。为了发挥SVM效用,请use
C-ordered numpy.ndarray (dense)
or scipy.sparse.csr_matrix (sparse)
with dtype=float64.)
1、分类
SVC, NuSVC and LinearSVC 是三个可以进行multi-class分类的模型。三者的本质区别就是 have
different mathematical formulations,具体参考本文最后的公式。
SVC, NuSVC and LinearSVC 和其他分类器一样,使用fit、predict方法:
After being fitted, the model can then be used to predict new values:
>>>
SVM中的支持向量的相关属性可以使用 support_vectors_, support_ and n_support来获取:
对于multi-class分类:
SVC and NuSVC 的机制是“one-against-one”(training n_class * (n_class - 1) / 2个 models),而 LinearSVC 的策略是“one-vs-the-rest”(training n_class个 models)
。而实践中,one-vs-rest是常用和较好的,因为结果其实差不多,但时间省好多。。。
[python] view
plaincopy
>>> X = [[0], [1], [2], [3]]
>>> Y = [0, 1, 2, 3]
>>> clf = svm.SVC()
>>> clf.fit(X, Y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
>>> dec = clf.decision_function([[1]])
>>> dec.shape[1] # 4 classes: 4*3/2 = 6
6
>>> lin_clf = svm.LinearSVC()
>>> lin_clf.fit(X, Y)
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=1000,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0)
>>> dec = lin_clf.decision_function([[1]])
>>> dec.shape[1]
4
关于样本所属类别的confidence:The SVC method decision_function gives
per-class scores for each sample。另外还有所谓的option probability,但是,If
confidence scores are required, but these do not have to be probabilities, then it is advisable to set probability=False and
use decision_function instead
of predict_proba.(主要是因为probability的理论背景有缺陷)
在每个class或者sample的权重不同的情况下,可以设置keywords class_weight andsample_weight :
类别权重:SVC (but
not NuSVC)
implement a keyword class_weight in
the fit method.
It’s a dictionary of the form {class_label : value}, where
value is a floating point number > 0 that sets the parameter C of
class class_label to C * value.
样本权重:SVC, NuSVC, SVR, NuSVR and OneClassSVM implement
also weights for individual samples in method fit through
keyword sample_weight.
Similar to class_weight,
these set the parameter C for
the i-th example to C * sample_weight[i].
最后给几个例子:
Plot
different SVM classifiers in the iris dataset,
SVM:
Maximum margin separating hyperplane,
SVM:
Separating hyperplane for unbalanced classes
SVM-Anova:
SVM with univariate feature selection,
Non-linear
SVM
SVM:
Weighted samples,
2、回归
Support Vector Regression.
看能明白这句话不能:Analogously(to
SVClassfication), the model produced by Support Vector Regression depends only on a subset of the training data, because the cost function for building the model ignores any training data close to the model prediction.
同样也是三个模型: SVR, NuSVR and LinearSVR。
给个例子:
Support
Vector Regression (SVR) using linear and non-linear kernels
3、Density estimation,novelty detection(密度估计、新颖性检测)
先看下wiki上怎么说Novelty
detection:Novelty detection is the
identification of new or unknown data that a machine
learning system has not been trained with and was not previously aware of,[1] with
the help of either statistical or machine
learning based approaches.
OneClassSVM is
used for novelty detection, that is, given a set of samples, it will detect the soft boundary of that set so as to classify
new points as belonging to that set or not. 过程是无监督的,所以输入只有X。
具体详细应用参考:section Novelty
and Outlier Detection 。
最后给出两个例子:
One-class
SVM with non-linear kernel (RBF)
Species
distribution modeling
4、复杂度
The
QP(quadratic programming problem) solver used by this libsvm-based
implementation scales between
and
depending
on how efficiently the libsvm cache
is used in practice (dataset dependent).
5、实际应用中的一些小tips
Avoid data copy;kernel cache size;
Setting C:C默认是1,但是如果data中有很多noisy observations,需要减小C;
it is highly recommended to
scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or
standardize it to have mean 0 and variance 1. Note that the same scaling
must be applied to the test vector to obtain meaningful results.
在 SVC中,如果数据样本unbalanced,set class_weight='auto' and/or
try different penalty parameters C.
6、kernel function
使用方式为:svm.SVC(kernel='linear'),常见的kernel有:
linear:
.
polynomial:
.
is
specified by keyword degree,
by coef0.
rbf:
.
is
specified by keyword gamma, must be greater than 0.
sigmoid (
),
where
is specified by coef0.
也可自定义kernel,例如:
SVM
with custom kernel.
7、Mathematical formulation
1、SVC:
2、SVR:
在实际项目中,我们真的很少用到那些简单的模型,比如LR、kNN、NB等,虽然经典,但在工程中确实不实用。
今天我们关注在工程中用的相对较多的SVM。
SVM功能不少:Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers
detection.
好处多多:高维空间的高效率;维度大于样本数的有效性;仅使用训练点的子集(称作支持向量),空间占用少;有不同的kernel functions供选择。
也有坏处:维度大于样本数的有效性----但维度如果相对样本数过高,则效果会非常差;不能直接提供概率估计,需要通过an expensive five-fold cross-validation (see Scores
and probabilities, below).才能实现。
(SVM支持dense和sparse sample vectors,但是如果预测使用的sparse data,那训练也要使用稀疏数据。为了发挥SVM效用,请use
C-ordered numpy.ndarray (dense)
or scipy.sparse.csr_matrix (sparse)
with dtype=float64.)
1、分类
SVC, NuSVC and LinearSVC 是三个可以进行multi-class分类的模型。三者的本质区别就是 have
different mathematical formulations,具体参考本文最后的公式。
SVC, NuSVC and LinearSVC 和其他分类器一样,使用fit、predict方法:
>>> from sklearn import svm >>> X = [[0, 0], [1, 1]] >>> y = [0, 1] >>> clf = svm.SVC() >>> clf.fit(X, y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
After being fitted, the model can then be used to predict new values:
>>>
>>> clf.predict([[2., 2.]]) array([1])
SVM中的支持向量的相关属性可以使用 support_vectors_, support_ and n_support来获取:
>>> # get support vectors >>> clf.support_vectors_ array([[ 0., 0.], [ 1., 1.]]) >>> # get indices of support vectors >>> clf.support_ array([0, 1]...) >>> # get number of support vectors for each class >>> clf.n_support_ array([1, 1]...)
对于multi-class分类:
SVC and NuSVC 的机制是“one-against-one”(training n_class * (n_class - 1) / 2个 models),而 LinearSVC 的策略是“one-vs-the-rest”(training n_class个 models)
。而实践中,one-vs-rest是常用和较好的,因为结果其实差不多,但时间省好多。。。
[python] view
plaincopy
>>> X = [[0], [1], [2], [3]]
>>> Y = [0, 1, 2, 3]
>>> clf = svm.SVC()
>>> clf.fit(X, Y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
>>> dec = clf.decision_function([[1]])
>>> dec.shape[1] # 4 classes: 4*3/2 = 6
6
>>> lin_clf = svm.LinearSVC()
>>> lin_clf.fit(X, Y)
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=1000,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0)
>>> dec = lin_clf.decision_function([[1]])
>>> dec.shape[1]
4
关于样本所属类别的confidence:The SVC method decision_function gives
per-class scores for each sample。另外还有所谓的option probability,但是,If
confidence scores are required, but these do not have to be probabilities, then it is advisable to set probability=False and
use decision_function instead
of predict_proba.(主要是因为probability的理论背景有缺陷)
在每个class或者sample的权重不同的情况下,可以设置keywords class_weight andsample_weight :
类别权重:SVC (but
not NuSVC)
implement a keyword class_weight in
the fit method.
It’s a dictionary of the form {class_label : value}, where
value is a floating point number > 0 that sets the parameter C of
class class_label to C * value.
样本权重:SVC, NuSVC, SVR, NuSVR and OneClassSVM implement
also weights for individual samples in method fit through
keyword sample_weight.
Similar to class_weight,
these set the parameter C for
the i-th example to C * sample_weight[i].
最后给几个例子:
Plot
different SVM classifiers in the iris dataset,
SVM:
Maximum margin separating hyperplane,
SVM:
Separating hyperplane for unbalanced classes
SVM-Anova:
SVM with univariate feature selection,
Non-linear
SVM
SVM:
Weighted samples,
2、回归
Support Vector Regression.
看能明白这句话不能:Analogously(to
SVClassfication), the model produced by Support Vector Regression depends only on a subset of the training data, because the cost function for building the model ignores any training data close to the model prediction.
同样也是三个模型: SVR, NuSVR and LinearSVR。
>>> from sklearn import svm >>> X = [[0, 0], [2, 2]] >>> y = [0.5, 2.5] >>> clf = svm.SVR() >>> clf.fit(X, y) SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma=0.0, kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False) >>> clf.predict([[1, 1]]) array([ 1.5])
给个例子:
Support
Vector Regression (SVR) using linear and non-linear kernels
3、Density estimation,novelty detection(密度估计、新颖性检测)
先看下wiki上怎么说Novelty
detection:Novelty detection is the
identification of new or unknown data that a machine
learning system has not been trained with and was not previously aware of,[1] with
the help of either statistical or machine
learning based approaches.
OneClassSVM is
used for novelty detection, that is, given a set of samples, it will detect the soft boundary of that set so as to classify
new points as belonging to that set or not. 过程是无监督的,所以输入只有X。
具体详细应用参考:section Novelty
and Outlier Detection 。
最后给出两个例子:
One-class
SVM with non-linear kernel (RBF)
Species
distribution modeling
4、复杂度
The
QP(quadratic programming problem) solver used by this libsvm-based
implementation scales between
and
depending
on how efficiently the libsvm cache
is used in practice (dataset dependent).
5、实际应用中的一些小tips
Avoid data copy;kernel cache size;
Setting C:C默认是1,但是如果data中有很多noisy observations,需要减小C;
it is highly recommended to
scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or
standardize it to have mean 0 and variance 1. Note that the same scaling
must be applied to the test vector to obtain meaningful results.
在 SVC中,如果数据样本unbalanced,set class_weight='auto' and/or
try different penalty parameters C.
6、kernel function
使用方式为:svm.SVC(kernel='linear'),常见的kernel有:
linear:
.
polynomial:
.
is
specified by keyword degree,
by coef0.
rbf:
.
is
specified by keyword gamma, must be greater than 0.
sigmoid (
),
where
is specified by coef0.
也可自定义kernel,例如:
>>> import numpy as np >>> from sklearn import svm >>> def my_kernel(x, y): ... return np.dot(x, y.T) ... >>> clf = svm.SVC(kernel=my_kernel)
SVM
with custom kernel.
7、Mathematical formulation
1、SVC:
2、SVR:
相关文章推荐
- C#继承基本控件实现自定义控件
- 自动提示宏-----输入什么属性,就把属性自动生成OC字符串
- Password SPY++
- Shell日期时间和时间戳的转换
- file_operation(文件操作)file(文件)inode(节点)
- dorado开发模式下实现动态查询
- 创业的第十二天
- 将python3.1+pyqt4打包成exe
- js之prototype用法(给对象添加通用属性/方法)
- 用装饰器打log
- scala模式匹配下的提取器动手构造实战
- 计算圆周率的C程序
- 数独解算器(ASP.NET 2.0)
- 获取未安装的APK图标
- 可以使用C#语言的在线ACM题库
- 用 C# 实现优先队列
- Python闭包
- fatal error: file '/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Dev
- 再谈 BigInteger - 优化
- 使用nsis做软件安装