您的位置:首页 > 编程语言 > Python开发

Dataset:利用Python将已有mnist数据集通过移动像素上下左右的方法来扩大数据集为初始数据集的5倍—Jason niu

2018-01-18 12:54 1161 查看
from __future__ import print_function

import cPickle
import gzip
import os.path
import random

import numpy as np

print("Expanding the MNIST training set")

if os.path.exists("../data/mnist_expanded.pkl.gz"):
print("The expanded training set already exists.  Exiting.")
else:
f = gzip.open("../data/mnist.pkl.gz", 'rb')
training_data, validation_data, test_data = cPickle.load(f)
f.close()
expanded_training_pairs = []
j = 0
for x, y in zip(training_data[0], training_data[1]):
expanded_training_pairs.append((x, y))
image = np.reshape(x, (-1, 28))
j += 1
if j % 1000 == 0: print("Expanding image number", j)

for d, axis, index_position, index in [
(1,  0, "first", 0),
(-1, 0, "first", 27),
(1,  1, "last",  0),
(-1, 1, "last",  27)]:
new_img = np.roll(image, d, axis)
if index_position == "first":
new_img[index, :] = np.zeros(28)
else:
new_img[:, index] = np.zeros(28)
expanded_training_pairs.append((np.reshape(new_img, 784), y))
random.shuffle(expanded_training_pairs)
expanded_training_data = [list(d) for d in zip(*expanded_training_pairs)]
print("Saving expanded data. This may take a few minutes.")
f = gzip.open("../data/mnist_expanded.pkl.gz", "w")
cPickle.dump((expanded_training_data, validation_data, test_data), f)
f.close()

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: