您的位置:首页 > 编程语言 > Python开发

Scikit Source Code Rea 4000 ding(2015.05.31)

2015-05-31 19:31 766 查看

Today’s Job

Today’s job is main about the source reading of plot_color_quantization.py and k_means_.py under scikit-learn-0.15.2\sklearn\cluster in scikit-learn-0.15.2.

Gains

pairwise_distances_argmin:

Compute minimum distances between one point and a set of points.

shuffle:

Shuffle arrays or sparse matrices in a consistent way

Lloyd’s algorithm and Vorlonoi Diagram

check_random_state(seed):

Turn seed into a np.random.RandomState instance

inertia:

Sum of distances of samples to their closest cluster center.

labels assignment is also called the E-step of EM

computation of the means is also called the M-step of EM

_tolerance(X, tol):

Return a tolerance which is independent of the dataset

Quesions to be solved

def _k_init(X, n_clusters, x_squared_norms, random_state, n_local_trials=None):

“”“Init n_clusters seeds according to k-means++

Selects initial cluster centers for k-mean clustering in a smart way

to speed up convergence. see: Arthur, D. and Vassilvitskii, S.

“k-means++: the advantages of careful seeding”. ACM-SIAM symposium

on Discrete algorithms. 2007

Version ported from http://www.stanford.edu/~darthur/kMeansppTest.zip,

which is the implementation used in the aforementioned paper.

Grid Search
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  python