您的位置:首页 > 编程语言 > MATLAB

MATLAB数据降维工具箱drtoolbox介绍

2014-01-22 16:12 176 查看
1.MATLAB drtoolbox介绍

The Matlab Toolbox for Dimensionality Reduction contains Matlab implementations of 38 techniques for dimensionality reduction and metric learning.

官方网站:

http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html

This Matlab toolbox implements 32 techniques for dimensionality reduction. These techniques are all available through the COMPUTE_MAPPING function or trhough the GUI. The following techniques are available:

- Principal Component Analysis ('PCA')

- Linear Discriminant Analysis ('LDA')

- Multidimensional scaling ('MDS')

- Probabilistic PCA ('ProbPCA')

- Factor analysis ('FactorAnalysis')

- Sammon mapping ('Sammon')

- Isomap ('Isomap')

- Landmark Isomap ('LandmarkIsomap')

- Locally Linear Embedding ('LLE')

- Laplacian Eigenmaps ('Laplacian')

- Hessian LLE ('HessianLLE')

- Local Tangent Space Alignment ('LTSA')

- Diffusion maps ('DiffusionMaps')

- Kernel PCA ('KernelPCA')

- Generalized Discriminant Analysis ('KernelLDA')

- Stochastic Neighbor Embedding ('SNE')

- Symmetric Stochastic Neighbor Embedding ('SymSNE')

- t-Distributed Stochastic Neighbor Embedding ('tSNE')

- Neighborhood Preserving Embedding ('NPE')

- Linearity Preserving Projection ('LPP')

- Stochastic Proximity Embedding ('SPE')

- Linear Local Tangent Space Alignment ('LLTSA')

- Conformal Eigenmaps ('CCA', implemented as an extension of LLE)

- Maximum Variance Unfolding ('MVU', implemented as an extension of LLE)

- Landmark Maximum Variance Unfolding ('LandmarkMVU')

- Fast Maximum Variance Unfolding ('FastMVU')

- Locally Linear Coordination ('LLC')

- Manifold charting ('ManifoldChart')

- Coordinated Factor Analysis ('CFA')

- Gaussian Process Latent Variable Model ('GPLVM')

- Autoencoders using stack-of-RBMs pretraining ('AutoEncoderRBM')

- Autoencoders using evolutionary optimization ('AutoEncoderEA')

Furthermore, the toolbox contains 6 techniques for intrinsic dimensionality estimation. These techniques are available through the function INTRINSIC_DIM. The following techniques are available:

- Eigenvalue-based estimation ('EigValue')

- Maximum Likelihood Estimator ('MLE')

- Estimator based on correlation dimension ('CorrDim')

- Estimator based on nearest neighbor evaluation ('NearNb')

- Estimator based on packing numbers ('PackingNumbers')

- Estimator based on geodesic minimum spanning tree ('GMST')

In addition to these techniques, the toolbox contains functions for prewhitening of data (the function PREWHITEN), exact and estimate out-of-sample extension (the functions OUT_OF_SAMPLE and OUT_OF_SAMPLE_EST), and a function that generates toy datasets (the function GENERATE_DATA).

The graphical user interface of the toolbox is accessible through the DRGUI function.
2.安装
将下载好的drtoolbox工具包解压到指定目录:D:\MATLAB\R2012b\toolbox

找到' D:\MATLAB\R2012b\toolbox\local\pathdef.m'文件,打开,并把路径添加到该文件中,保存。



运行 rehash toolboxcache 命令,完成工具箱加载

>>rehash toolboxcache

测试

>> what drtoolbox



3.工具箱说明

数据降维基本原理是将样本点从输入空间通过线性或非线性变换映射到一个低维空间,从而获得一个关于原数据集紧致的低维表示。

算法基本分类:

线性/非线性

线性降维是指通过降维所得到的低维数据能保持高维数据点之间的线性关系。线性降维方法主要包括PCA、LDA、LPP(LPP其实是Laplacian Eigenmaps的线性表示);非线性降维一类是基于核的,如KPCA,此处暂不讨论;另一类就是通常所说的流形学习:从高维采样数据中恢复出低维流形结构(假设数据是均匀采样于一个高维欧式空间中的低维流形),即找到高维空间中的低维流形,并求出相应的嵌入映射。非线性流形学习方法有:Isomap、LLE、Laplacian Eigenmaps、LTSA、MVU

整体来说,线性方法计算块,复杂度低,但对复杂的数据降维效果较差。

监督/非监督

监督式和非监督式学习的主要区别在于数据样本是否存在类别信息。非监督降维方法的目标是在降维时使得信息的损失最小,如PCA、LPP、Isomap、LLE、Laplacian Eigenmaps、LTSA、MVU;监督式降维方法的目标是最大化类别间的辨别信,如LDA。事实上,对于非监督式降维算法,都有相应的监督式或半监督式方法的研究。

全局/局部

局部方法仅考虑样品集合的局部信息,即数据点与临近点之间的关系。局部方法以LLE为代表,还包括Laplacian Eigenmaps、LPP、LTSA。

全局方法不仅考虑样本几何的局部信息,和考虑样本集合的全局信息,及样本点与非临近点之间的关系。全局算法有PCA、LDA、Isomap、MVU。

由于局部方法并不考虑数据流形上相距较远的样本之间的关系,因此,局部方法无法达到“使在数据流形上相距较远的样本的特征也相距较远”的目的。
4.工具箱使用

工具箱提供给用户使用的接口函数都在与这个Readme文件同路径的目录,主要包括如下文件:







使用实例:

clc
clear
close all

% 产生测试数据
[X, labels] = generate_data('helix', 2000);
figure
scatter3(X(:,1), X(:,2), X(:,3), 5, labels)
title('Original dataset')
drawnow

% 估计本质维数
no_dims = round(intrinsic_dim(X, 'MLE'));
disp(['MLE estimate of intrinsic dimensionality: ' num2str(no_dims)]);

% PCA降维
[mappedX, mapping] = compute_mapping(X, 'PCA', no_dims);
figure
scatter(mappedX(:,1), mappedX(:,2), 5, labels)
title('Result of PCA')

% Laplacian降维
[mappedX, mapping] = compute_mapping(X, 'Laplacian', no_dims, 7);
figure
scatter(mappedX(:,1), mappedX(:,2), 5, labels(mapping.conn_comp))
title('Result of Laplacian Eigenmaps')
drawnow

% Isomap降维
[mappedX, mapping] = compute_mapping(X, 'Isomap', no_dims);
figure
scatter(mappedX(:,1), mappedX(:,2), 5, labels(mapping.conn_comp))
title('Result of Isomap')
drawnow
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: