划分训练集和测试集和验证集
2018-01-08 19:14
288 查看
划分训练集和测试集和验证集:import os
import codecs
import random
random.seed(1229)
data = []
with codecs.open('neg.txt', "r", encoding='utf-8', errors='ignore') as fdata:
now = fdata.readlines()
data.append(['0 ' + item for item in now])
with codecs.open('pos.txt', "r", encoding='utf-8', errors='ignore') as fdata:
now = fdata.readlines()
data.append(['1 ' + item for item in now])
def get_test(data, n, x):
st, ed = len(data) * x // n, len(data) * (x+1) // n
return data[st:ed]
def get_train(data, n, x):
st, ed = len(data) * x // n, len(data) * (x+1) // n
return data[:st] + data[ed:]
for i in range(10):
train_ori = [get_train(item, 10, i) for item in data]
test_ori = [get_test(item, 10, i) for item in data]
train = []
dev = []
test = []
for j in range(2):
random.shuffle(train_ori[j])
x = len(train_ori[j]) * 9 // 10
train += train_ori[j][:x]
dev += train_ori[j][x:]
test += test_ori[j]
random.shuffle(train)
random.shuffle(dev)
random.shuffle(test)
os.system('mkdir mr%s' % i)
open('mr%s/train.txt' % i, 'w').writelines(train)
open('mr%s/dev.txt' % i, 'w').writelines(dev)
open('mr%s/test.txt' % i, 'w').writelines(test)
import codecs
import random
random.seed(1229)
data = []
with codecs.open('neg.txt', "r", encoding='utf-8', errors='ignore') as fdata:
now = fdata.readlines()
data.append(['0 ' + item for item in now])
with codecs.open('pos.txt', "r", encoding='utf-8', errors='ignore') as fdata:
now = fdata.readlines()
data.append(['1 ' + item for item in now])
def get_test(data, n, x):
st, ed = len(data) * x // n, len(data) * (x+1) // n
return data[st:ed]
def get_train(data, n, x):
st, ed = len(data) * x // n, len(data) * (x+1) // n
return data[:st] + data[ed:]
for i in range(10):
train_ori = [get_train(item, 10, i) for item in data]
test_ori = [get_test(item, 10, i) for item in data]
train = []
dev = []
test = []
for j in range(2):
random.shuffle(train_ori[j])
x = len(train_ori[j]) * 9 // 10
train += train_ori[j][:x]
dev += train_ori[j][x:]
test += test_ori[j]
random.shuffle(train)
random.shuffle(dev)
random.shuffle(test)
os.system('mkdir mr%s' % i)
open('mr%s/train.txt' % i, 'w').writelines(train)
open('mr%s/dev.txt' % i, 'w').writelines(dev)
open('mr%s/test.txt' % i, 'w').writelines(test)
相关文章推荐
- Sklearn-train_test_split随机划分训练集和测试集
- 【cl】预处理&划分测试集、训练集
- 训练集,测试集和验证集
- 【七】机器学习之路——训练集、测试集及如何划分
- 1、为什么caffe训练时训练集loss=0.06,验证集accuracy=0.98但测试集的准确率很低accuracy=0.67
- tensorflow:训练集、测试集、验证集
- 机器学习中训练集、验证集(开发集)、测试集如何划分
- [机器学习]划分训练集和测试集的方法
- Google---机器学习速成课程(五)-测试集/训练集/验证集Training Test and Validation Sets
- 为什么要划分训练集、验证集、测试集?
- Sklearn-train_test_split随机划分训练集和测试集
- Matlab划分测试集和训练集
- python 划分数据集为训练集和测试集
- 使用Java随机划分数据集为训练集和测试集
- Sklearn-train_test_split随机划分训练集和测试集
- 十折交叉验证10-fold cross validation, 数据集划分 训练集 验证集 测试集
- Sklearn-train_test_split随机划分训练集和测试集
- Sklearn-train_test_split随机划分训练集和测试集
- 十折交叉验证10-fold cross validation, 数据集划分 训练集 验证集 测试集
- Python数据预处理—训练集和测试集数据划分