您的位置:首页 > 其它

机器学习基础 维基翻译 超参数选择 K近邻法 及简单的sklearn例子 分类:机器学习Sklearn

2017-05-18 21:03 836 查看
In the context of mechine learning, hyperparameter optimization or model

selection is the problem of choosing a set of hyperparameters for a

learning algorithm, usually with the goal of optimizing a measure of

the algorithm’s performance on a independent data set.

超参数的选择是在独立数据集上算法表现最优。

Often cross-validation is used to estimate the generalization performance.

一般使用交叉验证。

Hyerparameter optimization contrasts with actual learning problems, which

are also often cast as optimization problems,but optimize a loss function

on the training set alone. In effect, learning algorithms learn parameters

that model/reconstruct their inputs well, while hyperparameter optimiztion

is to ensure the model dose not overfit its data by tuning.

算法的目标是更好地拟合数据,超参数最优化是防止过拟合。

e.g. regularization:

这里的正则化是指如AIC BIC引入的惩罚项。

正则化防止过拟合的道理,是对参数加限定,使其不会“拟合”得那么好,去掉一些

对噪声的过分拟合。

Algorithms for hyperparameter optimization

Grid search

网格搜索

The tradition way of performing hyerparameter optimization has been grid

search, or a parameter sweep,(参数扫描分析)

which is simply an exhaustive searching(穷举搜索)

through a manually specified subset of the hyperparameter sapce of a learning algorithm. A grid search algorithm must be guided by some performance metric

(效益指标)

typically measureed by cross-validation on the training set or on a held-out

validation set.

Since the parameter space of a machine learner may include real-valued or

unbouned value spaces for certain parameters, manually set bpuned and

discretization(离散化)

may be necessary before applying grid search.

For example, a typical soft-margin(软边际 考虑是求解约束条件中对向量长度的限定)

SVM classifier quipped with am REF kernel(Gaussion) has at least two

hyperparameters that need to be tuned(调整) for good performance on unseen

data: a regularization constant C and a kernel hyperparameter y.

Both parameters are continous ,so to perform grid search, one selects a

finite set of “reasonable” each,…

Grid search then trains an SVM with each pair (C,y) in the Certesian product

(笛卡尔直积) ot these two sets and evaluates their performance on a held-out

validation set(or by interal cross-validation on the training set, in which case mutiple SVMs are trained per pair).Finally, the grid search algorithm ourpute the settings that achieved the highest score in the validation produre

.

Grid search suffers from the curse pf dimensionality(降低了用于拟合的样本量

导致维数灾难) but is often embarrassiongly parallel(并行) because typically the

hyperparameter settings it evaluates are independent of each other.

In pattern recognition, the k-Nearest Neighbors algorithm(or k-NN)

is a non-parametric methods used for classification and regression.In oth cases, the inputs consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

In k-NN classification, the output is a class membership, An object is clssified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small)

这里指会被分到与此样本点最近的K个样本所代表的类别中投票最多的那个。

If k = 1,then the object is simpply assigned to the class of that single nearest neighbor.

In k-NN regression,the output is the properly value for the object. This value is the average of the values of its k nearest neighbors.

用其最相近的k个样本的平均值作为其估计。

k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm

is among the simplest of all machine learning algorithms.

Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors,

so thst the nearer neighbors contributr more the the average than the more distant ones. For example, a common weighting scheme consists in giving each neghbor a weight of 1/d, where d is the distance to the neighbor.

这里提出了按照距离的倒数进行赋权的方法。

The neighbors are taken from a set of objects for which the class(for k-NN classification)or the object properly value(for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.

由于与待决定样本点最近的k个样本相应属性(分类或值)是已知的,故可以将这些样本

看成训练集。

A shortcoming of k-NN algorithm is that it is sensitive to the local structure

of the data.

缺点,对周围的数据结构敏感。

Algorithm

The training examples are vectors in a mutidimensional feature space, each with

a class lable. The training phase of the algorithm consists only of the storing the feature vectors and class labels of the training samples.

算法仅仅由样本特征及分类向量构成。

In the classification phase, k is a user-defined constant,and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to the query point.

特征选择时默认使用F统计量,此统计量来源于回归方程整体显著性的度量。

GridSearchCV实际是一种集成分类器。

Pipeline为管道技术,同时估计出多个模型参数。
[python] view plain copy print?from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import FeatureUnion
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV

iris = load_iris()
X, y = iris.data, iris.target
pca = PCA(n_components = 2)
selection = SelectKBest(k = 1)
combine_features = FeatureUnion([(’pca’, pca), (‘univ_select’, selection)])
X_feature = combine_features.fit(X, y).transform(X)
#print X_feature

svm = SVC(kernel = ”linear”)

pipeline = Pipeline([(”features”, combine_features), (“svm”, svm)])
param_grid = dict(features__pca__n_components = [1, 2, 3],
features__univ_select__k = [1, 2],
svm__C = [0.1, 1, 10])
grid_search = GridSearchCV(pipeline, param_grid = param_grid, verbose = 10)
grid_search.fit(X, y)
print grid_search.best_estimator_


from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import FeatureUnion
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV

iris = load_iris()
X, y = iris.data, iris.target
pca = PCA(n_components = 2)
selection = SelectKBest(k = 1)
combine_features = FeatureUnion([('pca', pca), ('univ_select', selection)])
X_feature = combine_features.fit(X, y).transform(X)


#print X_feature

svm = SVC(kernel = “linear”)

pipeline = Pipeline([(“features”, combine_features), (“svm”, svm)])
param_grid = dict(features__pca__n_components = [1, 2, 3],
features__univ_select__k = [1, 2],
svm__C = [0.1, 1, 10])
grid_search = GridSearchCV(pipeline, param_grid = param_grid, verbose = 10)
grid_search.fit(X, y)
print grid_search.best_estimator_

维基翻译:

In pattern recognitio, the k-Nesrest Neighbors algorithm(or k-NN for short)

更多了解请浏览:http://blog.csdn.net/sinat_30665603
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  机器学习
相关文章推荐