您的位置:首页 > 移动开发

Unsupervised Nearest Neighbors Clustering With Application to Hyperspectral Images

2016-06-26 12:32 471 查看

A dynamic niching clustering algorithm based on individual-connectedness and its application to color image segmentation

Abstract

KSEM, a stochastic extension of the kNN density-based clustering (KNNCLUST) method which randomly assigns objects to clusters by sampling a posterior class label distribution.

Notations

X: Dataset, i.e. X={xi}, xi∈Rd, i=1,…,n.

Ci: Discrete random variable corresponding to the class label held by object xi.

ci: Outcome label sampled from some distribution on Ci.

c: c=[c1,…,cn]T be the vector of cluster labels.

p(Ci|xi;{xj,cj}j≠i): Local posterior distribution of Ci.

κ(i): Set of indices of the kNNs of xi.

Ω(i): {cj|j∈κ(i)}.

Algorithm

The local posterior label distribution in KSEM can be modelled primarily as:

p^(Ci=cL|xi;{xj,cj}j∈κ(i))∝∑j∈κ(i)g(xj,xi)δcjcL(1)

∀cL∈Ω(i), 1≤i≤n, where g is a (non negative) kernel function defined on Rd, δij is the Kronecker delta. Though many kernel functions can be used, in this work, they have restricted to the following Gaussian kernel:

g(x,xi)=1(2π−−√dk,κ(xi))dexp⎛⎝−12∥x−xi∥22d2k,κ(xi)(xi)⎞⎠,(2)

where x∈Rd, and dk,S(xi) represents the distance from xi to its kth NN. Then they propose the estimation of posterior label distribution as follows:

p^α(Ci=cL|xi;{xj,cj}j∈κ(i))=[∑j∈κ(i)g(xj,xi)δcjcL]α∑cm∈Ω(i)[∑j∈κ(i)g(xj,xi)δcjcm]α(3)

∀cL∈Ω(i),1≤i≤n, where α∈[1,+∞] is a parameter controlling the degree of determinism in the construction of the pseudo-sample: α=1corresponds to the SEM (stochastic) scheme, while α→+∞ corresponds to the CEM (deterministic) scheme, leading to a labeling scheme which is similar to the KNNCLUST’s rule. In this work, setting α=1.2 is recommended.

Leting ScL={xi∈X|ci=cL}, teh Kozachenko-Leonenko conditional differential entropy estimate writes:

h^(X|cL)=dnL∑xi∈ScLlndk,ScL(xi)+ln(nL−1)−ψ(k)+lnVd(4)

∀cL∈Ω, where nL=|ScL|, ψ(k)=Γ′(k)/Γ(k) is the digamma function, Γ(k) is the gamma function and Vd=πd/2/Γ(d/2+1) is the volume of the unit ball in Rd. An overall clustering entropy measure can be obtained from conditional entropies (4) as:

h^(X|c)=1n∑cL∈ΩnLh^(X|cL)(5)

This measure can be used as a stopping criterion during the iterations quite naturally. Since objects are aggregated into preciously formed clusters during the iterations, the individual class-conditional entropies can only increase, and so does the conditional entropy(5). However, when convergence is achieved, this measure reaches an upper limit, and therefore a stopping criterion can be set up from its relative magnitude variation Δh=|h^(X|c(t))−h^(X|c(t−1))|/h^(X|c(t−1)), where c(t) is the vector of cluster labels at iteration t. The stopping criterion Δh<10−4 is recommended.



Application

Despite the reduction in complexity brought by the kNN search, the case of image segmentation by unsupervised clustering of pixels with KSEM remains computationally difficult, which can severely lower its usage for large size images. In the particular domain of multivariate imagery (multispectral/hyperspectral), the objects of interest are primarily grouped thanks to their spectral information characteristics. To help the clustering of image pixels, one often uses the spatial information, and the fact that two neighboring pixels are likely to belong to the same cluster. So they limit the search of a pixel’s kNNs to a subset of its spatial neighbors, selected via a predefined sampling pattern. Specifically, the pattern has a local sampling density inversely proportional to the distance from the central (query) point (as shown in figure below).

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  clustering