您的位置:首页 > 编程语言 > Java开发

Java---划分训练集与测试集

2019-08-20 17:40 148 查看
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。 本文链接:https://blog.csdn.net/qq_41982466/article/details/99865084

使用Java完成文本数据的训练集与测试集划分
如:总数据1000条文本数据,划分成训练集800条,测试集200条

public class data_split {
/** 测试集比例 */
public static double test_percent = 0.2;
public static void main(String[] args) throws Exception {
splitData(source_path,train_path,test_path);
}
/**
* 写入训练集与测试集
* @param class_name
* @throws Exception
*/
public static void splitData(String source_path,String train_path,String test_path) throws Exception {

long allNum = getLineNumber(source_path);
int end = (int) ((int)allNum * (1 - test_percent));
// 打开文件
FileInputStream fis = new FileInputStream(source_path);
BufferedReader br = new BufferedReader(new InputStreamReader(fis, "UTF-8"));
// 写入训练集
FileWriter fw = new FileWriter(train_path, true);
PrintWriter out = new PrintWriter(fw);
String line = "";
int count = 0;
while ((line = br.readLine()) != null) {
out.write(line);
out.println();
count ++;
if (count == end)
break;
}
// 关闭文件
fw.close();
out.close();
// 写入测试集
FileWriter fw2 = new FileWriter(test_path, true);
PrintWriter out2 = new PrintWriter(fw2);
while ((line = br.readLine()) != null) {
out2.write(line);
out2.println();
}
// 关闭文件
fw2.close();
out2.close();
br.close();
fis.close();
}
/**
* 获取文件的行数
* @param file
* @return
*/
public static long getLineNumber(String path) {
File file = new File(path);
if (file.exists()) {
try {
FileReader fileReader = new FileReader(file);
LineNumberReader lineNumberReader = new LineNumberReader(fileReader);
lineNumberReader.skip(Long.MAX_VALUE);
long lines = lineNumberReader.getLineNumber() + 1;
fileReader.close();
lineNumberReader.close();
return lines;
} catch (IOException e) {
e.printStackTrace();
}
}
return 0;
}
}

完!

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: