您的位置:首页 > 编程语言 > MATLAB

Matlab随机森林库

2012-04-21 17:13 120 查看
什么是随机森林?

Random forest is a classification technique that proposed by
Leo Brieman (2001), given the set of class-labeled data, builds a set of classification trees. Each tree is developed from a bootstrap sample from the training data. When developing individual trees, an arbitrary subset of attributes is drawn (hence the
term "random") from which the best attribute for the split is selected. The classification is based on the majority vote from individually developed tree classifiers in the forest.

更为详细的解释:http://en.wikipedia.org/wiki/Random_forest

Matlab库下载

原始实现:
http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm,即将发布新版
从R改装来的实现:
http://randomforest-matlab.googlecode.com/files/Windows-Precompiled-RF_MexStandalone-v0.02-.zip
基于随机森林的集成分类应用:

ENSEMBLE CLASSIFICATION

(1) A conference paper investigating binary classification strategies with ensemble classification has been published. [Chan J.C.-W., Demarchi, L., Van De
Voorde, T., & Canters, F. (2008),”Binary classification strategies for mapping urban land cover with ensemble classifiers”, Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 6-11, 2008, Boston, Massachusetts, USA. Vol.
III, pp. 1004-1007.] (see Annex A.9)

Since the data sets related to HABISTAT were not ready in the beginning of 2008, a study on binary classification with ensemble classifiers was conducted using 2 data sets in suburban areas. In the paper, two binary classification strategies were examined to
further extend the strength of ensemble classifiers for mapping of urban objects. The first strategy was a one-against-one approach. The idea behind it was to employ a pairwise binary classification where n(n-1)/2 classifiers are created, n being the number
of classes. Each of the n(n-1)/2  classifiers was trained using only training cases from two classes at a time. The ensemble was then combined by majority voting. The second strategy was a one-against-all binary approach: if there are n classes, with a = {1,…,
n} being one of the classes, then n classifiers were generated, each representing a binary classification of a and non-a. The ensemble was combined using accuracy estimates obtained for each class. Both binary strategies were applied on two single classifiers
(decision trees and artificial neural networks) and two ensemble classifiers (Random Forest and Adaboost). Two multi-source data sets were used: one was prepared for an object-based classification and one for a conventional pixel-based approach. Our results
indicate that ensemble classifiers generate significantly higher accuracies than a single classifier. Compared to a single C5.0 tree, Random Forest and Adaboost increased the accuracy by 2 to 12%. The range of increase depends on the data set that was used.
Applying binary classification strategies often increases accuracy, but only marginally (between 1-3%). All increases are statistically significant, except on one occasion. Coupling ensemble classifiers with binary classification always yielded the highest
accuracies. For our first data set, the highest accuracy was obtained with Adaboost and a 1-against-1 strategy, 4.3% better than for a single tree;  for the second data set with the Random Forest approach and a 1-against-all strategy, 13.6% higher than for
a single tree.

While the results show statistically significant improvement, the increase in accuracy is marginal. Given its long training time, we have to consider carefully if it is worthwhile to apply this strategy.
(2) We used the ensemble classifier Random Forest to produce four levels of classification using 3 different data sets in the framework of workpackage Validation WP 5200. The data set that was used for this experiment is AHS airborne data.
A total of 12 classifications were made (see Figure 9). The results with Random Forest were compared with the performance from other classifiers: Linear Discriminant Analysis, Markov Random Field.
The processing has a problem in terms of the number of training samples and also spatial independence (see Table 5). This issue with the training, testing and validation sets has been discussed during the mid-term evaluation and is under
investigation.



Figure 9. Validation exercise using airborne AHS data. The columns represent 3 data sets and rows represent 4 levels of classification. Classifications were done using Random Forest.

Table 5. Table showing the classification scheme and training size at each level.



(3) The use of ensemble classification was studied in all classification tasks with spaceborne data. Two conference papers in relation to classification of heath lands using superresolution enhanced CHRIS data were presented. Random Forest were used for
the classifications. The results show rather consistent and satisfactory results with Random Forest. Below are two illustrations (Figure 10 and Figure 11) of the application of Random Forest on the original CHRIS and superresolution enhanced CHRIS data set.
For more details, please refer to the paper attached in
annex A.8. Random Forest seems to have worked very well with our data sets. We will continue to use and investigate the strength of this ensemble classifier.



Figure 10. Random Forest classification of SR CHRIS (Kalmthout, Belgium). Results presented at IGARSS, July 6-11, 2008, Boston, Massachusetts, USA. (see Annex B of annual report #1)



Figure 11. Random Forest classification of SR CHRIS (Ginkel, the Netherlands). Results presented at the 6th EARSeL SIG Imaging Spectroscopy workshop 2009, Tel Aviv, March 16-19 2009. (see
Annex A.8)

来源:http://habistat.vgt.vito.be/modules/Results/EC.php

Orange软件提供的随机森林实现
http://orange.biolab.si/doc/widgets/_static/Classify/RandomForest.htm
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息