您的位置:首页 > 编程语言 > Python开发

写个python脚本下载并解压 MNIST 数据集(1)

2017-06-11 20:57 477 查看
【UpdateTime:201706011】

写个python脚本下载并解压 MNIST 数据集

一、本文目的

MNIST之于机器学习&&深度学习,就相当于cout<<"hello world"之于编程(引用于tensorflow教程)。最近刚入门深度学习,当然也不忘学习机器学习,接触了各种MNIST相关的案例。本文的主要贡献是基于python语言编写一个自动下载和解压MNIST的程序,在此整理归纳并分享,后续根据学习情况继续更新。

本文涉及的相关插件,请看脚本最前面的import相关内容。由于本文实验之前安装过多种深度学习的框架,所以一些相关的插件也都已经存在于系统中。倘若读者遇到什么问题,可以根据提示安装相关的插件(pip
install xxx)

本文的原理很简单,就是通过如下代码下载数据集(urllib 插件):

filepath, _ = urllib.request.urlretrieve(SOURCE_URL + filename, filepath)
statinfo = os.stat(filepath)


然后通过如下代码解压数据集(uzip):

cmd = ['gzip', '-d', target_path]
print('Unzip ', target_path)
subprocess.call(cmd)


二、环境

1、Ubuntu环境:http://blog.csdn.net/houchaoqun_xmu/article/details/72453187

2、Anaconda2:http://blog.csdn.net/houchaoqun_xmu/article/details/72461592

三、代码

# Copyright 20170611 . All Rights Reserved.
# Prerequisites:
# Python 2.7
# gzip, subprocess, numpy
#
# ==============================================================================
"""Functions for downloading and uzip MNIST data."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import gzip
import subprocess
import os
import numpy
from six.moves import urllib

def maybe_download(filename, data_dir, SOURCE_URL):
"""Download the data from Yann's website, unless it's already here."""
filepath = os.path.join(data_dir, filename)
if not os.path.exists(filepath):
filepath, _ = urllib.request.urlretrieve(SOURCE_URL + filename, filepath) statinfo = os.stat(filepath)
print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')

def check_file(data_dir):
if os.path.exists(data_dir):
return True
else:
os.mkdir(data_dir)
return False

def uzip_data(target_path):
# uzip mnist data
cmd = ['gzip', '-d', target_path] print('Unzip ', target_path) subprocess.call(cmd)

def read_data_sets(data_dir):
if check_file(data_dir):
print(data_dir)
print('dir mnist already exist.')

# delete the dir mnist
cmd = ['rm', '-rf', data_dir]
print('delete the dir', data_dir)
subprocess.call(cmd)
os.mkdir(data_dir)

SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/'
data_keys = ['train-images-idx3-ubyte.gz', 'train-labels-idx1-ubyte.gz', 't10k-images-idx3-ubyte.gz', 't10k-labels-idx1-ubyte.gz']
for key in data_keys:
if os.path.isfile(os.path.join(data_dir, key)):
print("[warning...]", key, "already exist.")
else:
maybe_download(key, data_dir, SOURCE_URL)

# uzip the mnist data.
uziped_data_keys = ['train-images-idx3-ubyte', 'train-labels-idx1-ubyte', 't10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte']
for key in uziped_data_keys:
if os.path.isfile(os.path.join(data_dir, key)):
print("[warning...]", key, "already exist.")
else:
target_path = os.path.join(data_dir, key)
uzip_data(target_path)

if __name__ == '__main__':
print("===== running - input_data() script =====")
read_data_sets("./mnist")
print("============= =============")


打开终端执行如下命令:

python get_mnist.py
效果如下所示:



代码下载地址:http://download.csdn.net/detail/houchaoqun_xmu/9867456

四、相关文献

Activation-Visualization-Histogram:https://github.com/shaohua0116/Activation-Visualization-Histogram

MNIST机器学习入门:http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/mnist_beginners.html

Python读取mnist:http://blog.csdn.net/mmmwhy/article/details/62891092

Tesnorflow下载MNIST手写数字识别数据集的python代码:http://download.csdn.net/detail/yhhyhhyhhyhh/9738704

batch处理的MNIST代码(tensorflow_GPU):http://download.csdn.net/detail/houchaoqun_xmu/9851221
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  python MNIST Ubuntu