MATLAB中调用Weka设置方法(转)及示例
2016-07-05 15:46
666 查看
本文转自:
http://blog.sina.com.cn/s/blog_890c6aa30101av9x.html
Create Java library file, i.e., .jar file.
Put created .jar file to one of directories Matlab uses for storing libraries, and add corresponding path to
Matlab configuration file, $MATLABINSTALLDIR\$MatlabVersion\toolbox\local\classpath.txt.
安装weka
在环境变量的系统变量中的Path中加入jre6(或者其他的)中bin文件夹的绝对路径,如:
C:\Program Files\Java\jre1.8.0_77\bin;
查找MATLAB配置文件classpath.txt
which classpath.txt %使用这个命令可以查找classpath.txt的位置
修改配置文件classpath.txt
edit classpath.txt
在classpath.txt配置文件中将weka安装目录下的weka.jar的绝对安装路径填入,如:
C:\Program Files\Weka-3-8\weka.jar
重启MATLAB
运行如下命令:
attributes = javaObject(‘weka.core.FastVector’);
%如果MATLAB没有报错,就说明配置成功了
Matlab在调用weka中的类时,经常遇见heap space溢出的情况,我们需要设置较大的堆栈,设置方法是:
Matlab->File->Preference->General->Java Heap Memory, 然后设置适当的值。
Matlab调用Weka示例
代码来自:
http://cn.mathworks.com/matlabcentral/fileexchange/37311-smoteboost
http://www.mathworks.com/matlabcentral/fileexchange/37315-rusboost
http://blog.sina.com.cn/s/blog_890c6aa30101av9x.html
MATLAB命令行下验证Java版本命令
version -java配置MATLAB调用Java库
Finish Java codes.Create Java library file, i.e., .jar file.
Put created .jar file to one of directories Matlab uses for storing libraries, and add corresponding path to
Matlab configuration file, $MATLABINSTALLDIR\$MatlabVersion\toolbox\local\classpath.txt.
配置MATLAB调用Weka
下载weka安装weka
在环境变量的系统变量中的Path中加入jre6(或者其他的)中bin文件夹的绝对路径,如:
C:\Program Files\Java\jre1.8.0_77\bin;
查找MATLAB配置文件classpath.txt
which classpath.txt %使用这个命令可以查找classpath.txt的位置
修改配置文件classpath.txt
edit classpath.txt
在classpath.txt配置文件中将weka安装目录下的weka.jar的绝对安装路径填入,如:
C:\Program Files\Weka-3-8\weka.jar
重启MATLAB
运行如下命令:
attributes = javaObject(‘weka.core.FastVector’);
%如果MATLAB没有报错,就说明配置成功了
Matlab在调用weka中的类时,经常遇见heap space溢出的情况,我们需要设置较大的堆栈,设置方法是:
Matlab->File->Preference->General->Java Heap Memory, 然后设置适当的值。
Matlab调用Weka示例
代码来自:
http://cn.mathworks.com/matlabcentral/fileexchange/37311-smoteboost
http://www.mathworks.com/matlabcentral/fileexchange/37315-rusboost
clc; clear all; close all; file = 'data.csv'; % Dataset % Reading training file data = dlmread(file); label = data(:,end); % Extracting positive data points idx = (label==1); pos_data = data(idx,:); row_pos = size(pos_data,1); % Extracting negative data points neg_data = data(~idx,:); row_neg = size(neg_data,1); % Random permuation of positive and negative data points p = randperm(row_pos); n = randperm(row_neg); % 80-20 split for training and test tstpf = p(1:round(row_pos/5)); tstnf = n(1:round(row_neg/5)); trpf = setdiff(p, tstpf); trnf = setdiff(n, tstnf); train_data = [pos_data(trpf,:);neg_data(trnf,:)]; test_data = [pos_data(tstpf,:);neg_data(tstnf,:)]; % Decision Tree prediction = SMOTEBoost(train_data,test_data,'tree',false); disp (' Label Probability'); disp ('-----------------------------'); disp (prediction);
function prediction = SMOTEBoost (TRAIN,TEST,WeakLearn,ClassDist) % This function implements the SMOTEBoost Algorithm. For more details on the % theoretical description of the algorithm please refer to the following % paper: % N.V. Chawla, A.Lazarevic, L.O. Hall, K. Bowyer, "SMOTEBoost: Improving % Prediction of Minority Class in Boosting, Journal of Knowledge Discovery % in Databases: PKDD, 2003. % Input: TRAIN = Training data as matrix % TEST = Test data as matrix % WeakLearn = String to choose algortihm. Choices are % 'svm','tree','knn' and 'logistic'. % ClassDist = true or false. true indicates that the class % distribution is maintained while doing weighted % resampling and before SMOTE is called at each % iteration. false indicates that the class distribution % is not maintained while resampling. % Output: prediction = size(TEST,1)x 2 matrix. Col 1 is class labels for % all instances. Col 2 is probability of the instances % being classified as positive class. javaaddpath('weka.jar'); %% Training SMOTEBoost % Total number of instances in the training set m = size(TRAIN,1); POS_DATA = TRAIN(TRAIN(:,end)==1,:); NEG_DATA = TRAIN(TRAIN(:,end)==0,:); pos_size = size(POS_DATA,1); neg_size = size(NEG_DATA,1); % Reorganize TRAIN by putting all the positive and negative exampels % together, respectively. TRAIN = [POS_DATA;NEG_DATA]; % Converting training set into Weka compatible format CSVtoARFF (TRAIN, 'train', 'train'); train_reader = javaObject('java.io.FileReader', 'train.arff'); train = javaObject('weka.core.Instances', train_reader); train.setClassIndex(train.numAttributes() - 1); % Total number of iterations of the boosting method T = 10; % W stores the weights of the instances in each row for every iteration of % boosting. Weights for all the instances are initialized by 1/m for the % first iteration. W = zeros(1,m); for i = 1:m W(1,i) = 1/m; end % L stores pseudo loss values, H stores hypothesis, B stores (1/beta) % values that is used as the weight of the % hypothesis while forming the % final hypothesis. % All of the following are of length <=T and stores % values for every iteration of the boosting process. L = []; H = {}; B = []; % Loop counter t = 1; % Keeps counts of the number of times the same boosting iteration have been % repeated count = 0; % Boosting T iterations while t <= T % LOG MESSAGE disp (['Boosting iteration #' int2str(t)]); if ClassDist == true % Resampling POS_DATA with weights of positive example POS_WT = zeros(1,pos_size); sum_POS_WT = sum(W(t,1:pos_size)); for i = 1:pos_size POS_WT(i) = W(t,i)/sum_POS_WT ; end RESAM_POS = POS_DATA(randsample(1:pos_size,pos_size,true,POS_WT),:); % Resampling NEG_DATA with weights of positive example NEG_WT = zeros(1,neg_size); sum_NEG_WT = sum(W(t,pos_size+1:m)); for i = 1:neg_size NEG_WT(i) = W(t,pos_size+i)/sum_NEG_WT ; end RESAM_NEG = NEG_DATA(randsample(1:neg_size,neg_size,true,NEG_WT),:); % Resampled TRAIN is stored in RESAMPLED RESAMPLED = [RESAM_POS;RESAM_NEG]; % Calulating the percentage of boosting the positive class. 'pert' % is used as a parameter of SMOTE pert = ((neg_size-pos_size)/pos_size)*100; else % Indices of resampled train RND_IDX = randsample(1:m,m,true,W(t,:)); % Resampled TRAIN is stored in RESAMPLED RESAMPLED = TRAIN(RND_IDX,:); % Calulating the percentage of boosting the positive class. 'pert' % is used as a parameter of SMOTE pos_size = sum(RESAMPLED(:,end)==1); neg_size = sum(RESAMPLED(:,end)==0); pert = ((neg_size-pos_size)/pos_size)*100; end % Converting resample training set into Weka compatible format CSVtoARFF (RESAMPLED,'resampled','resampled'); reader = javaObject('java.io.FileReader','resampled.arff'); resampled = javaObject('weka.core.Instances',reader); resampled.setClassIndex(resampled.numAttributes()-1); % New SMOTE boosted data gets stored in S smote = javaObject('weka.filters.supervised.instance.SMOTE'); pert = ((neg_size-pos_size)/pos_size)*100; smote.setPercentage(pert); smote.setInputFormat(resampled); S = weka.filters.Filter.useFilter(resampled, smote); % Training a weak learner. 'pred' is the weak hypothesis. However, the % hypothesis function is encoded in 'model'. switch WeakLearn case 'svm' model = javaObject('weka.classifiers.functions.SMO'); case 'tree' model = javaObject('weka.classifiers.trees.J48'); case 'knn' model = javaObject('weka.classifiers.lazy.IBk'); model.setKNN(5); case 'logistic' model = javaObject('weka.classifiers.functions.Logistic'); end model.buildClassifier(S); pred = zeros(m,1); for i = 0 : m - 1 pred(i+1) = model.classifyInstance(train.instance(i)); end % Computing the pseudo loss of hypothesis 'model' loss = 0; for i = 1:m if TRAIN(i,end)==pred(i) continue; else loss = loss + W(t,i); end end % If count exceeds a pre-defined threshold (5 in the current % implementation), the loop is broken and rolled back to the state % where loss > 0.5 was not encountered. if count > 5 L = L(1:t-1); H = H(1:t-1); B = B(1:t-1); disp (' Too many iterations have loss > 0.5'); disp (' Aborting boosting...'); break; end % If the loss is greater than 1/2, it means that an inverted % hypothesis would perform better. In such cases, do not take that % hypothesis into consideration and repeat the same iteration. 'count' % keeps counts of the number of times the same boosting iteration have % been repeated if loss > 0.5 count = count + 1; continue; else count = 1; end L(t) = loss; % Pseudo-loss at each iteration H{t} = model; % Hypothesis function beta = loss/(1-loss); % Setting weight update parameter 'beta'. B(t) = log(1/beta); % Weight of the hypothesis % At the final iteration there is no need to update the weights any % further if t==T break; end % Updating weight for i = 1:m if TRAIN(i,end)==pred(i) W(t+1,i) = W(t,i)*beta; else W(t+1,i) = W(t,i); end end % Normalizing the weight for the next iteration sum_W = sum(W(t+1,:)); for i = 1:m W(t+1,i) = W(t+1,i)/sum_W; end % Incrementing loop counter t = t + 1; end % The final hypothesis is calculated and tested on the test set % simulteneously. %% Testing SMOTEBoost n = size(TEST,1); % Total number of instances in the test set CSVtoARFF(TEST,'test','test'); test = 'test.arff'; test_reader = javaObject('java.io.FileReader', test); test = javaObject('weka.core.Instances', test_reader); test.setClassIndex(test.numAttributes() - 1); % Normalizing B sum_B = sum(B); for i = 1:size(B,2) B(i) = B(i)/sum_B; end prediction = zeros(n,2); for i = 1:n % Calculating the total weight of the class labels from all the models % produced during boosting wt_zero = 0; wt_one = 0; for j = 1:size(H,2) p = H{j}.classifyInstance(test.instance(i-1)); if p==1 wt_one = wt_one + B(j); else wt_zero = wt_zero + B(j); end end if (wt_one > wt_zero) prediction(i,:) = [1 wt_one]; else prediction(i,:) = [0 wt_one]; end end
function r = CSVtoARFF (data, relation, type) % csv to arff file converter % load the csv data [rows cols] = size(data); % open the arff file for writing farff = fopen(strcat(type,'.arff'), 'w'); % print the relation part of the header fprintf(farff, '@relation %s', relation); % Reading from the ARFF header fid = fopen('ARFFheader.txt','r'); tline = fgets(fid); while ischar(tline) tline = fgets(fid); fprintf(farff,'%s',tline); end fclose(fid); % Converting the data for i = 1 : rows % print the attribute values for the data point for j = 1 : cols - 1 if data(i,j) ~= -1 % check if it is a missing value fprintf(farff, '%d,', data(i,j)); else fprintf(farff, '?,'); end end % print the label for the data point fprintf(farff, '%d\n', data(i,end)); end % close the file fclose(farff); r = 0;
function model = ClassifierTrain(data,type) % Training the classifier that would do the sample selection javaaddpath('weka.jar'); CSVtoARFF(data,'train','train'); train_file = 'train.arff'; reader = javaObject('java.io.FileReader', train_file); train = javaObject('weka.core.Instances', reader); train.setClassIndex(train.numAttributes() - 1); % options = javaObject('java.lang.String'); switch type case 'svm' model = javaObject('weka.classifiers.functions.SMO'); kernel = javaObject('weka.classifiers.functions.supportVector.RBFKernel'); model.setKernel(kernel); case 'tree' model = javaObject('weka.classifiers.trees.J48'); % options = weka.core.Utils.splitOptions('-C 0.2'); % model.setOptions(options); case 'knn' model = javaObject('weka.classifiers.lazy.IBk'); model.setKNN(5); case 'logistic' model = javaObject('weka.classifiers.functions.Logistic'); end model.buildClassifier(train);
function prediction = ClassifierPredict(data,model) % Predicting the labels of the test instances % Input: data = test data % model = the trained model % type = type of classifier % Output: prediction = prediction labels javaaddpath('weka.jar'); CSVtoARFF(data,'test','test'); test_file = 'test.arff'; reader = javaObject('java.io.FileReader', test_file); test = javaObject('weka.core.Instances', reader); test.setClassIndex(test.numAttributes() - 1); prediction = []; for i = 0 : size(data,1) - 1 p = model.classifyInstance(test.instance(i)); prediction = [prediction; p]; end
相关文章推荐
- Matlab2012a安装教程之Ubuntu16.04
- Matlab的GUI中figure的WindowButtonDownFcn与axes的ButtonDownFcn
- Matlab常用命令汇总
- Matlab 中 set(h, 'ButtonDownFcn',@buttonDownCallback)
- Matlab中hold on与hold off的用法
- weka和matlab完成完整分类实验
- Help out of memory的一些总结
- MATLAB“out of memory"的一点总结
- 使用matlab预处理数据,读取,转置,切分,存储,导入
- Matlab函数陷阱
- 【matlab】预分配内存提高运行效率以及时间比较
- 基于MATLAB的线性代数 对矩阵(具体数字)求逆矩阵
- 基于MATLAB的线性代数 对n阶矩阵求伴随矩阵
- 基于MATLAB的线性代数 各阶主子式
- 基于MATLAB的线性代数 对矩阵求行列式(具体数字)
- 基于MATLAB的线性代数 求行列式(数字和符号杂糅的)
- 基于MATLAB的线性代数 对矩阵(具体数字)求秩
- 基于MATLAB的线性代数 对矩阵取共轭(不用函数)
- 基于MATLAB的线性代数 加、减、乘、除(矩阵单元素)
- Matlab绘制三维曲面