Keras使用全神经网络增加MNIST手写数字测试准确性
Keras使用全神经网络(dense)将testaccrucy测试准确性多次调试后到达98.5%。
1.原代码
import tensorflow as tf import tensorflow.contrib.keras as keras from PIL import Image import numpy as np import matplotlib.pyplot as plt # MNIST handwritten, 60k 28*28 grayscale images of the 10 digits, # along with a test set of 10k images # http://yann.lecun.com/exdb/mnist/ # load data (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() print('Training set: {} and Training Targets: {}'.format(x_train.shape, y_train.shape)) print('Test set: {} and test targets: {}'.format(x_test.shape, y_test.shape)) print('First training data: {}. \n Its size is: {}'.format(x_train[0], x_train[0].shape)) #show first 16 images for i in range(16): plt.subplot(4, 4, i+1) plt.imshow(x_train[i], cmap = 'Greys_r') plt.show() # set seeds np.random.seed(9987) tf.set_random_seed(9987) #generate one-hot labels y_train_onehot = keras.utils.to_categorical(y_train) print('First 10 labels: ', y_train[:10]) print('First 10 one-hot labels: ', y_train_onehot[:10]) # preprocessing data #1 reshape images to be row vectors x_train_1 = np.reshape(x_train, [x_train.shape[0], x_train.shape[1] * x_train.shape[2]]) x_test_1 = np.reshape(x_test, [x_test.shape[0], x_test.shape[1] * x_test.shape[2]]) plt.imshow(np.reshape(x_train_1[0], [28, 28]), cmap = 'Greys_r') plt.show() #implement a feedforward NN: 2 hidden layers each have 50 hidden unites with tanh activation, # and one output layer with 10 units for the 10 classes model = keras.models.Sequential() model.add(keras.layers.Dense( units=50, input_dim=x_train_1.shape[1], kernel_initializer='glorot_uniform', bias_initializer='zeros', activation='tanh')) model.add(keras.layers.Dense( units=50, input_dim=50, kernel_initializer='glorot_uniform', bias_initializer='zeros', activation='tanh')) model.add(keras.layers.Dense( units=y_train_onehot.shape[1], input_dim=50, kernel_initializer='glorot_uniform', bias_initializer='zeros', activation='softmax')) sgd_optimizer = keras.optimizers.SGD(lr=0.001, decay=1e-7, momentum=.9) model.compile(optimizer=sgd_optimizer, loss='categorical_crossentropy') history = model.fit(x_train_1, y_train_onehot, batch_size=64, epochs=50, verbose=1, validation_split=0.1) #predict the class labels y_train_pred = model.predict_classes(x_train_1, verbose=0) correct_preds = np.sum(y_train == y_train_pred, axis=0) train_acc = correct_preds / y_train.shape[0] print('Training accuracy: %.2f%%' % (train_acc * 100)) y_test_pred = model.predict_classes(x_test_1, verbose=0) correct_preds = np.sum(y_test == y_test_pred, axis=0) test_acc = correct_preds / y_test.shape[0] print('Test accuracy: %.2f%%' % (test_acc * 100)) print('end')
上面使用了4层全连接层 网络结构为784-50-50-10。上面代码跑出来的测试准确性为92.75%。
下面讨论有什么办法可以增加测试准确性
1.改变激活函数
源代码中使用tanh 函数, 常用激活函数有 sigmoid、relu、tanh。依次尝试后我得到一下结果
1. change activation layer (in 2 dense layer)
Use tanh 91.49%
Use sigmoid 95.04%
Use relu 95.77%
relu函数能跑出最高测试准确性,接下来都使用relu函数进行实验
2.改变Batch_size大小
batch_size=64, 91.49%
batch_size=128 93.07%
batch_size=200 94.20%When batch_size=256 on two relu layer, test accuracy is 96.30%
When batch_size=300 on two relu layer, test accuracy is77.84%
我们发现batch_size不能过大也不能过小,要一个适中值才能发挥最佳作用。在这个coding中我将使用batch_size=256.
3.尝试增加层数(layer)
try to add denselayer (all activation is relu, adam optimizer, batch_size=256, hiddenunits=50)
one dense layer test accuracy is 86.96%
two dense layers test accuracy is 96.2%
3 dense layers test accuracy is 96.5%
4 dense layers test accuracy is 96.46%
5 dense layers test accuracy is 96.5%
It seems only add layers but not change hidden units, don't change a lot for test accuracy.
4.Adding dropout(0.05)
two dense layers test accuracy is 96.2%
after add dropout is 96.92%
3 dense layers test accuracy is 96.5%
After add dropout is 96.56%
5.Adding epochs
When epochs =60 the effect is higher than epochs =50.
6.Add data batchnormalization
Before: two layers (with dropout0.05) is 96.92%
After : two layers(with dropout0.05) is 97.33%
x_train = x_train.astype('float32') / 255. x_test = x_test.astype('float32') / 255
7.change optimizer
Adam optimizer on 2 relu layer 97.41%
sgd optimizer on 2 relu layer 81.85%
RMSprop 2 relu 97.30%
Adagrad on 2 relu layer is 97.04%
sgd not fit for relu layer.
I would use adamoptimizer
Adam_optimizer = keras.optimizers.Adam(lr=0.001) model.compile(optimizer=Adam_optimizer, loss='categorical_crossentropy')
8.change units
这是最最重要的一步,我网上看到mmc2015写的一篇文章关于最后一个隐藏层结点个数越接近输出层的结点个数,效果相对越好。所以我将最后层数设置为依次递减,最后一层设为16最最接近于10.
我的网络结构为784-512-256-128-64-32-16-10
下面展示我的pathon代码
import tensorflow as tf import tensorflow.contrib.keras as keras from PIL import Image from keras import regularizers import numpy as np import matplotlib.pyplot as plt # MNIST handwritten, 60k 28*28 grayscale images of the 10 digits, # along with a test set of 10k images # http://yann.lecun.com/exdb/mnist/ # load data (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() print('Training set: {} and Training Targets: {}'.format(x_train.shape, y_train.shape)) print('Test set: {} and test targets: {}'.format(x_test.shape, y_test.shape)) print('First training data: {}. \n Its size is: {}'.format(x_train[0], x_trai 1c6f4 n[0].shape)) #show first 16 images for i in range(16): plt.subplot(4, 4, i+1) plt.imshow(x_train[i], cmap = 'Greys_r') plt.show() # set seeds np.random.seed(9987) tf.set_random_seed(9987) x_train = x_train.astype('float32') / 255. x_test = x_test.astype('float32') / 255 #generate one-hot labels y_train_onehot = keras.utils.to_categorical(y_train) print('First 10 labels: ', y_train[:10]) print('First 10 one-hot labels: ', y_train_onehot[:10]) print('y_train_onehot.shape[1]',y_train_onehot.shape[1]) # preprocessing data #1 reshape images to be row vectors x_train_1 = np.reshape(x_train, [x_train.shape[0], x_train.shape[1] * x_train.shape[2]]) x_test_1 = np.reshape(x_test, [x_test.shape[0], x_test.shape[1] * x_test.shape[2]]) plt.imshow(np.reshape(x_train_1[0], [28, 28]), cmap = 'Greys_r') plt.show() #implement a feedforward NN: 2 hidden layers each have 50 hidden unites with tanh activation, # and one output layer with 10 units for the 10 classes model = keras.models.Sequential() model.add(keras.layers.Dense( units=512, input_dim=784, kernel_initializer='glorot_uniform', bias_initializer='zeros', activation='relu')) model.add(keras.layers.Dropout(0.05)) model.add(keras.layers.Dense( units=256, input_dim=512, kernel_initializer='glorot_uniform', bias_initializer='zeros', activation='relu')) model.add(keras.layers.Dropout(0.05)) model.add(keras.layers.Dense( units=128, input_dim=256, kernel_initializer='glorot_uniform', bias_initializer='zeros', activation='relu')) model.add(keras.layers.Dropout(0.05)) model.add(keras.layers.Dense( units=64, input_dim=128, kernel_initializer='glorot_uniform', bias_initializer='zeros', activation='relu')) model.add(keras.layers.Dropout(0.05)) model.add(keras.layers.Dense( units=32, input_dim=64, kernel_initializer='glorot_uniform', bias_initializer='zeros', activation='relu')) model.add(keras.layers.Dropout(0.05)) model.add(keras.layers.Dense( units=16, input_dim=32, kernel_initializer='glorot_uniform', bias_initializer='zeros', activation='relu')) model.add(keras.layers.Dropout(0.05)) model.add(keras.layers.Dense( units=y_train_onehot.shape[1], input_dim=16, kernel_initializer='glorot_uniform', bias_initializer='zeros', activation='softmax')) Adam_optimizer = keras.optimizers.Adam(lr=0.001) model.compile(optimizer=Adam_optimizer, loss='categorical_crossentropy') history = model.fit(x_train_1, y_train_onehot, batch_size=256, epochs=350, verbose=1, validation_split=0.1) #predict the class labels y_train_pred = model.predict_classes(x_train_1, verbose=0) correct_preds = np.sum(y_train == y_train_pred, axis=0) train_acc = correct_preds / y_train.shape[0] print('Training accuracy: %.2f%%' % (train_acc * 100)) y_test_pred = model.predict_classes(x_test_1, verbose=0) correct_preds = np.sum(y_test == y_test_pred, axis=0) test_acc = correct_preds / y_test.shape[0] print('Test accuracy: %.2f%%' % (test_acc * 100)) print('end')
如果将epoch调的越来越大,最终的效果越好。我将epochs设为50时为98.2%,epochs为100时为98.4%,epochs为150时为98.6%.
可能最终epochs越来越高testaccuracy测试准确率能达到99%
阅读更多
- Keras(2):使用Keras构建神经网络进行Mnist手写字体分类,并定性分析各种超参数的影响
- 使用tensorflow利用神经网络分类识别MNIST手写数字数据集,转自随心1993
- 神经网络与深度学习 1.6 使用Python实现基于梯度下降算法的神经网络和MNIST数据集的手写数字分类程序
- 深度学习-传统神经网络使用TensorFlow框架实现MNIST手写数字识别
- 使用Keras构建神经网络进行Mnist手写字体分类
- 神经网络与深度学习 使用Python实现基于梯度下降算法的神经网络和自制仿MNIST数据集的手写数字分类可视化程序 web版本
- 神经网络与深度学习笔记——第1章 使用神经网络识别手写数字
- 使用caffemodel模型(由mnist训练)测试单张手写数字样本
- 【深度学习】笔记3_caffe自带的第一个例子,Mnist手写数字识别所使用的LeNet网络模型的详细解释
- Keras入门课2 -- 使用CNN识别mnist手写数字
- keras:1)初体验-MLP神经网络实现MNIST手写识别
- 读书笔记-神经网络与深度学习(一)-使用神经网络识别手写数字
- 《神经网络和深度学习》系列文章一:使用神经网络识别手写数字
- caffe手写数字mnist训练测试使用教程
- 使用Keras创建神经网络对数据集MNIST分类
- Keras搭建第一个CNN神经网络(mnist手写体数字分类)
- 使用神经网络识别手写数字
- 利用tensorflow一步一步实现基于MNIST 数据集进行手写数字识别的神经网络,逻辑回归
- 使用逻辑回归和神经网络进行手写数字识别
- 《neural networks and deep learning》——使用神经网络识别手写数字