您的位置:首页 > 运维架构

利用tensorflow在mnist上训练和测试LeNet模型

2018-03-10 09:12 666 查看

1. MNIST 数据集的下载及其介绍

MNIST数据集分成两部分:60000行的训练数据集(mnist.train)和10000行的测试数据集(mnist.test)。每一个MNIST数据单元有两部分组成:一张包含手写数字的图片和一个对应的标签。训练数据集的图片是 mnist.train.images ,训练数据集的标签是 mnist.train.labels。每一张图片包含28X28个像素点。把这个数组展开成一个向量,长度是 28x28 = 784。因此,在MNIST训练数据集中,mnist.train.images 是一个形状为 [60000, 784] 的张量,第一个维度数字用来索引图片,第二个维度数字用来索引每张图片中的像素点。在此张量里的每一个元素,都表示某张图片里的某个像素的强度值,值介于0和1之间。相对应的MNIST数据集的标签是介于0到9的数字,用来描述给定图片里表示的数字。此处使用的标签数据是”one-hot vectors”。 一个one-hot向量除了某一位的数字是1以外其余各维度数字都是0。所以,数字n将表示成一个只有在第n维度(从0开始)数字为1的10维向量。比如,标签0将表示成([1,0,0,0,0,0,0,0,0,0,0])。因此, mnist.train.labels 是一个 [60000, 10] 的数字矩阵。

2. 实现过程

2.1 tensorflow 环境

若集群未事先装有tensorflow模块,可利用cacheArchive参数特性进行配置,方法如下:

- 打包TensorFlow的库,它依赖的那些库可以先在环境安装,也可以将所有依赖的一起打包。如:tar -zcvf tensorflow.tgz ./*

- 上传该压缩包至hdfs,如放置在hdfs的/tmp/tensorflow.tgz

- xlearning提交脚本中,添加cacheArchive参数,如: –cacheArchive /tmp/tensorflow.tgz#tensorflow

- 在launch-cmd中所执行的脚本中,添加环境变量设置:export PYTHONPATH=./:$PYTHONPATH

tensorflow依赖库安装

yum install numpy python-devel python-wheel


2.2 训练模型

进入目录

cd /var/lib/ambari-server/resources/stacks/CRH/5.1/services/XLEARNING/xlearning-1.2/examples/tfmnist
export XLEARNING_HOME=/var/lib/ambari-server/resources/stacks/CRH/5.1/services/XLEARNING/xlearning-1.2


运行脚本run.sh

#!/bin/sh
$XLEARNING_HOME/bin/xl-submit \
--app-type "tensorflow" \
--app-name "tf-mnist" \
--input /tmp/data/tfmnist/MNIST_data#data \
--output /tmp/tfmnist_model#model \
--files demo.py,input_data.py,demo.sh \
--cacheArchive /tmp/tensorflow.tgz#tensorflow \
--launch-cmd "sh demo.sh" \
--worker-memory 2G \
--worker-num 2 \
--worker-cores 3 \
--ps-memory 2G \
--ps-num 1 \
--ps-cores 2 \
--queue default \


demo.sh脚本

export PYTHONPATH=./:$PYTHONPATH
python demo.py --data_path=./data --save_path=./model --log_dir=./eventLog


demo.py代码

import argparse
import sys
import os
import json
import numpy as np
import time

sys.path.append(os.getcwd())
import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

import tensorflow as tf

FLAGS = None

def main(_):
# cluster specification
FLAGS.task_index = int(os.environ["TF_INDEX"])
FLAGS.job_name = os.environ["TF_ROLE"]
cluster_def = json.loads(os.environ["TF_CLUSTER_DEF"])
cluster = tf.train.ClusterSpec(cluster_def)
#sess = tf.InteractiveSession()

print("ClusterSpec:", cluster_def)
print("current task id:", FLAGS.task_index, " role:", FLAGS.job_name)

gpu_options = tf.GPUOptions(allow_growth=True)
server = tf.train.Server(cluster, job_name=FLAGS.job_name, task_index=FLAGS.task_index,
config=tf.ConfigProto(gpu_options=gpu_options, allow_soft_placement=True))

if FLAGS.job_name == "ps":
server.join()
elif FLAGS.job_name == "worker":
# set the train parameters
with tf.device(tf.train.replica_device_setter(worker_device=("/job:worker/task:%d" % (FLAGS.task_index)),
cluster=cluster)):
global_step = tf.get_variable('global_step', []
4000
, initializer=tf.constant_initializer(0), trainable=False)
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

#sess.run(tf.global_variables_initializer())

y = tf.matmul(x, W) + b
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)

def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)

def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1, 28, 28, 1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init_op = tf.global_variables_initializer()
saver = tf.train.Saver()  # defaults to saving all variables

sv = tf.train.Supervisor(is_chief=(FLAGS.task_index == 0), global_step=global_step, init_op=init_op)
with sv.prepare_or_wait_for_session(server.target,
config=tf.ConfigProto(gpu_options=gpu_options, allow_soft_placement=True,
log_device_placement=True)) as sess:
# perform training cycles
start_time = time.time()
if (FLAGS.task_index == 0):
train_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)

sess.run(init_op)
for i in range(20000):
batch = mnist.train.next_batch(50)
elapsed_time = time.time() - start_time
start_time = time.time()
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x: batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g, Time: %3.2fms" % (i, train_accuracy, float(elapsed_time*1000)))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
sys.stderr.write("reporter progress:%0.4f\n"%(float(i/20000)))
print("test accuracy %g" % accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
print("Train Completed.")
if (FLAGS.task_index == 0):
train_writer.close()
print("saving model...")
saver.save(sess, FLAGS.save_path+"/model.ckpt")
print("done")

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.register("type", "bool", lambda v: v.lower() == "true")
# Flags for defining the tf.train.ClusterSpec
parser.add_argument(
"--job_name",
type=str,
default="",
help="One of 'ps', 'worker'"
)
# Flags for defining the tf.train.Server
parser.add_argument(
"--task_index",
type=int,
default=0,
help="Index of task within the job"
)
# Flags for defining the parameter of data path
parser.add_argument(
"--data_path",
type=str,
default="",
help="The path for train file"
)
parser.add_argument(
"--save_path",
type=str,
default="",
help="The save path for model"
)
parser.add_argument(
"--log_dir",
type=str,
default="",
help="The log path for model"
)

FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main)


注:saver部分将训练的权重和偏置保存下来,在评价程序中可以再次使用。

2.3 准备测试图片,用Opencv进行预处理

训练好了网络,下一步就要测试它了。准备一张图片,然后用Opencv预处理一下再放到评价程序里,看看能不能准确识别。

使用的是Opencv对图像进行预处理,缩小它的大小为28*28像素,并转变为灰度图,进行二值化处理。

(1) stdafx.h文件

添加opencv相关的头文件

#include <opencv2/highgui/highgui.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/gpu/gpu.hpp>
#include <opencv2/core/core.hpp>
#include <opencv/cv.h>
#include <opencv/cxcore.h>
#include <opencv/highgui.h>


(2)TF_ImgPreProcess.cpp文件

#include "stdafx.h"

#include <opencv2/core/core.hpp>
#include <opencv2/core/opengl_interop.hpp>
#include <opencv2/gpu/gpu.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/contrib/contrib.hpp>
using namespace std;
using namespace cv;

int _tmain(int argc, _TCHAR* argv[])
{
IplImage* img = cvLoadImage("E:\\png\\5.png",1);
IplImage* copyImg=cvCreateImage(cvGetSize(img),IPL_DEPTH_8U,3);
cvCopyImage(img,copyImg);
IplImage* ResImg=cvCreateImage(cvSize(28,28),IPL_DEPTH_8U,1);
IplImage* TmpImg=cvCreateImage(cvGetSize(ResImg),IPL_DEPTH_8U,3);

cvResize(copyImg,TmpImg,CV_INTER_LINEAR);
cvCvtColor(TmpImg,ResImg,CV_RGB2GRAY);
cvThreshold(ResImg,ResImg,100,255,CV_THRESH_BINARY_INV);

cvSaveImage("E:\\png\\result\\1.png",ResImg);
cvWaitKey(0);

return 0;
}


2.4 将图片输入网络进行识别

在环境中安装opencv包

yum install opencv-python -y


这里编写了一个前向传播的程序,最后softmax层分类的结果就是最后的识别结果。

程序如下:

“`python

from PIL import Image, ImageFilter

import tensorflow as tf

import cv2

def imageprepare():

“””

This function returns the pixel values.

The imput is a png file location.

“””

file_name=’/data/sxl/MNIST_recognize/p_num2.png’#导入自己的图片地址

#in terminal ‘mogrify -format png *.jpg’ convert jpg to png

im = Image.open(file_name).convert(‘L’)

im.save("/data/sxl/MNIST_recognize/sample.png")
tv = list(im.getdata()) #get pixel values

#normalize pixels to 0 and 1. 0 is pure white, 1 is pure black.
tva = [ (255-x)*1.0/255.0 for x in tv]
#print(tva)
return tva

"""
This function returns the predicted integer.
The imput is the pixel values from the imageprepare() function.
"""

# Define the model (same as when creating the model file)


result=imageprepare()

x = tf.placeholder(tf.float32, [None, 784])

W = tf.Variable(tf.zeros([784, 10]))

b = tf.Variable(tf.zeros([10]))

def weight_variable(shape):

initial = tf.truncated_normal(shape, stddev=0.1)

return tf.Variable(initial)

def bias_variable(shape):

initial = tf.constant(0.1, shape=shape)

return tf.Variable(initial)

def conv2d(x, W):

return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=’SAME’)

def max_pool_2x2(x):

return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=’SAME’)

W_conv1 = weight_variable([5, 5, 1, 32])

b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1,28,28,1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])

b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])

b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder(tf.float32)

h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_fc2 = weight_variable([1024, 10])

b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

init_op = tf.initialize_all_variables()

init_op = tf.global_variables_initializer()

“””

Load the model2.ckpt file

file is stored in the same directory as this python script is started

Use the model to predict the integer. Integer is returend as list.

Based on the documentatoin at

https://www.tensorflow.org/versions/master/how_tos/variables/index.html

“””

saver = tf.train.Saver()

with tf.Session() as sess:

sess.run(init_op)

saver.restore(sess, “/data/sxl/MNIST
b0b5
_recognize/form/model2.ckpt”)#这里使用了之前保存的模型参数

#print (“Model restored.”)

prediction=tf.argmax(y_conv,1)
predint=prediction.
print(h_conv2)

print('recognize result:')


print(predint[0])

输入图片为:
![](/upload/images/20180309//f8c775df-a50b-4278-a2aa-ef51653938a1.png)
运行结果为:
![](/upload/images/20180309//be8605a5-d009-4156-b1d7-d423d35797de.png)
说明:
tensorflow模型保存为:
```python
saver = tf.train.Saver()
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
saver.save(sess,"checkpoint/model.ckpt",global_step=1)

<div class="se-preview-section-delimiter"></div>


运行后,保存模型保存,得到三个文件,分别为.data,.meta,.index,

model.ckpt.data-00000-of-00001

model.ckpt.index

model.ckpt.meta

meta file保存了graph结构,包括 GraphDef, SaverDef等.

index file为一个 string-string table,table的key值为tensor名,value为BundleEntryProto, BundleEntryProto.

data file保存了模型的所有变量的值.

模型加载为:

with tf.Session() as sess:
saver.restore(sess, "/checkpoint/model.ckpt")


运行后,保存模型保存,得到三个文件,分别为.data,.meta,.index,
model.ckpt.data-00000-of-00001
model.ckpt.index
model.ckpt.meta
meta file保存了graph结构,包括 GraphDef, SaverDef等.
index file为一个 string-string table,table的key值为tensor名,value为BundleEntryProto, BundleEntryProto.
data file保存了模型的所有变量的值.
模型加载为:
```python
with tf.Session() as sess: saver.restore(sess, "/checkpoint/model.ckpt")


更多精彩原创文章,详见红象云腾社区
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  CRH Redoop tensorflow mnist