您的位置:首页 > 编程语言 > Python开发

Python如何进行cross validation training

2016-04-01 10:32 543 查看
以4-fold validation training为例
(1) 给定数据集data和标签集label

样本个数为

sampNum = len(data)


(2) 将给定的所有examples分为10组

每个fold个数为

foldNum = sampNum/10


(3) 将给定的所有examples分为10组

参考scikit-learn的3.1节:Cross-validation

1 import np
2 from sklearn import cross_validation
3 # dataset
4
5 data = np.array([[1,3],[2,4],[3.1,3],[4,5],[5.0,0.3],[4.1,3.1]])
6 label = np.array([0,1,1,1,0,0])
7 sampNum= len(data)
8
9 # 10-fold (9份为training,1份为validation)
10 kf = KFold(len(data), n_folds=4)
11 iFold = 0
12 for train_index, val_index in kf:
13     iFold = iFold+1
14     X_train, X_val, y_train, y_val = data[train_index], data[val_index], label[train_index], label[val_index] # 这里的X_train,y_train为第iFold个fold的训练集,X_val,y_val为validation set


  

给定的数据集如下:





所有样本的指标集为:

01234567
每个iFold(共4个)的训练集和validation set的index分别为:

iFold = 0 (训练集中包含6个examples,validation set 中包含3个examples)



iFold = 1



iFold = 2



iFold = 3



每个iFold的训练集和validation set分别为:

X_train, X_val, y_train, y_val = data[train_index], data[val_index], label[train_index], label[val_index]


  
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: