您的位置:首页 > 理论基础 > 计算机网络

Keras使用全神经网络增加MNIST手写数字测试准确性

2018-04-04 10:16 423 查看
Keras使用全神经网络(dense)将testaccrucy测试准确性多次调试后到达98.5%。
1.原代码
import tensorflow as tf
import tensorflow.contrib.keras as keras
from PIL import Image

import numpy as np
import matplotlib.pyplot as plt

# MNIST handwritten, 60k 28*28 grayscale images of the 10 digits,
# along with a test set of 10k images
# http://yann.lecun.com/exdb/mnist/

# load data
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
print('Training set: {} and Training Targets: {}'.format(x_train.shape, y_train.shape))
print('Test set: {} and test targets: {}'.format(x_test.shape, y_test.shape))
print('First training data: {}. \n Its size is: {}'.format(x_train[0], x_train[0].shape))

#show first 16 images
for i in range(16):
plt.subplot(4, 4, i+1)
plt.imshow(x_train[i], cmap = 'Greys_r')
plt.show()

# set seeds
np.random.seed(9987)
tf.set_random_seed(9987)

#generate one-hot labels
y_train_onehot = keras.utils.to_categorical(y_train)
print('First 10 labels: ', y_train[:10])
print('First 10 one-hot labels: ', y_train_onehot[:10])

# preprocessing data
#1 reshape images to be row vectors
x_train_1 = np.reshape(x_train, [x_train.shape[0], x_train.shape[1] * x_train.shape[2]])
x_test_1 = np.reshape(x_test, [x_test.shape[0], x_test.shape[1] * x_test.shape[2]])
plt.imshow(np.reshape(x_train_1[0], [28, 28]), cmap = 'Greys_r')
plt.show()

#implement a feedforward NN: 2 hidden layers each have 50 hidden unites with tanh activation,
# and one output layer with 10 units for the 10 classes
model = keras.models.Sequential()
model.add(keras.layers.Dense(
units=50,
input_dim=x_train_1.shape[1],
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
activation='tanh'))
model.add(keras.layers.Dense(
units=50,
input_dim=50,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
activation='tanh'))

model.add(keras.layers.Dense(
units=y_train_onehot.shape[1],
input_dim=50,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
activation='softmax'))
sgd_optimizer = keras.optimizers.SGD(lr=0.001, decay=1e-7, momentum=.9)
model.compile(optimizer=sgd_optimizer, loss='categorical_crossentropy')

history = model.fit(x_train_1, y_train_onehot,
batch_size=64, epochs=50,
verbose=1,
validation_split=0.1)

#predict the class labels
y_train_pred = model.predict_classes(x_train_1, verbose=0)
correct_preds = np.sum(y_train == y_train_pred, axis=0)
train_acc = correct_preds / y_train.shape[0]
print('Training accuracy: %.2f%%' % (train_acc * 100))

y_test_pred = model.predict_classes(x_test_1, verbose=0)
correct_preds = np.sum(y_test == y_test_pred, axis=0)
test_acc = correct_preds / y_test.shape[0]
print('Test accuracy: %.2f%%' % (test_acc * 100))

print('end')


上面使用了4层全连接层 网络结构为784-50-50-10。上面代码跑出来的测试准确性为92.75%。
下面讨论有什么办法可以增加测试准确性
1.改变激活函数
源代码中使用tanh 函数, 常用激活函数有 sigmoid、relu、tanh。依次尝试后我得到一下结果

1. change activation layer (in 2 dense layer)

Use tanh 91.49%

Use sigmoid 95.04%

Use relu  95.77%

relu函数能跑出最高测试准确性,接下来都使用relu函数进行实验

2.改变Batch_size大小

batch_size=64,     91.49%

batch_size=128    93.07%

batch_size=200     94.20% 

When batch_size=256 on two relu layer, test accuracy is 96.30% 

When batch_size=300 on two relu layer, test accuracy is77.84%  

我们发现batch_size不能过大也不能过小,要一个适中值才能发挥最佳作用。在这个coding中我将使用batch_size=256.


3.尝试增加层数(layer)

try to add denselayer (all activation is relu, adam optimizer, batch_size=256, hiddenunits=50)

one dense layer test accuracy is 86.96%

two dense layers test accuracy is 96.2%

3 dense layers test accuracy is 96.5%

4 dense layers test accuracy is 96.46%

5 dense layers test accuracy is 96.5%

It seems only add layers but not change hidden units, don't change a lot for test accuracy.


4.Adding dropout(0.05)

two dense layers test accuracy is 96.2%

after add dropout is 96.92%

3 dense layers test accuracy is 96.5%

After add dropout is 96.56%

 

5.Adding epochs

When epochs =60 the effect is higher than epochs =50.


6.Add data batchnormalization

Before: two layers (with dropout0.05) is 96.92%

After :   two layers(with dropout0.05) is 97.33%

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255

7.change optimizer

Adam optimizer on 2 relu layer  97.41%

sgd optimizer on 2 relu layer  81.85%

RMSprop  2 relu 97.30%

Adagrad on 2 relu layer is 97.04%

sgd not fit for relu layer.

I would use adamoptimizer

Adam_optimizer = keras.optimizers.Adam(lr=0.001)
model.compile(optimizer=Adam_optimizer, loss='categorical_crossentropy')

8.change units
这是最最重要的一步,我网上看到mmc2015写的一篇文章关于最后一个隐藏层结点个数越接近输出层的结点个数,效果相对越好。所以我将最后层数设置为依次递减,最后一层设为16最最接近于10.
我的网络结构为784-512-256-128-64-32-16-10
下面展示我的pathon代码
import tensorflow as tf
import tensorflow.contrib.keras as keras
from PIL import Image
from keras import regularizers
import numpy as np
import matplotlib.pyplot as plt

# MNIST handwritten, 60k 28*28 grayscale images of the 10 digits,
# along with a test set of 10k images
# http://yann.lecun.com/exdb/mnist/

# load data
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
print('Training set: {} and Training Targets: {}'.format(x_train.shape, y_train.shape))
print('Test set: {} and test targets: {}'.format(x_test.shape, y_test.shape))
print('First training data: {}. \n Its size is: {}'.format(x_train[0], x_trai
1c6f4
n[0].shape))

#show first 16 images
for i in range(16):
plt.subplot(4, 4, i+1)
plt.imshow(x_train[i], cmap = 'Greys_r')
plt.show()

# set seeds
np.random.seed(9987)
tf.set_random_seed(9987)
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255

#generate one-hot labels
y_train_onehot = keras.utils.to_categorical(y_train)
print('First 10 labels: ', y_train[:10])
print('First 10 one-hot labels: ', y_train_onehot[:10])
print('y_train_onehot.shape[1]',y_train_onehot.shape[1])

# preprocessing data
#1 reshape images to be row vectors
x_train_1 = np.reshape(x_train, [x_train.shape[0], x_train.shape[1] * x_train.shape[2]])
x_test_1 = np.reshape(x_test, [x_test.shape[0], x_test.shape[1] * x_test.shape[2]])
plt.imshow(np.reshape(x_train_1[0], [28, 28]), cmap = 'Greys_r')
plt.show()

#implement a feedforward NN: 2 hidden layers each have 50 hidden unites with tanh activation,
# and one output layer with 10 units for the 10 classes
model = keras.models.Sequential()

model.add(keras.layers.Dense(
units=512,
input_dim=784,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
activation='relu'))
model.add(keras.layers.Dropout(0.05))

model.add(keras.layers.Dense(
units=256,
input_dim=512,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
activation='relu'))
model.add(keras.layers.Dropout(0.05))

model.add(keras.layers.Dense(
units=128,
input_dim=256,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
activation='relu'))
model.add(keras.layers.Dropout(0.05))
model.add(keras.layers.Dense(
units=64,
input_dim=128,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
activation='relu'))
model.add(keras.layers.Dropout(0.05))
model.add(keras.layers.Dense(
units=32,
input_dim=64,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
activation='relu'))
model.add(keras.layers.Dropout(0.05))
model.add(keras.layers.Dense(
units=16,
input_dim=32,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
activation='relu'))

model.add(keras.layers.Dropout(0.05))
model.add(keras.layers.Dense(
units=y_train_onehot.shape[1],
input_dim=16,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
activation='softmax'))
Adam_optimizer = keras.optimizers.Adam(lr=0.001)
model.compile(optimizer=Adam_optimizer, loss='categorical_crossentropy')

history = model.fit(x_train_1, y_train_onehot,
batch_size=256, epochs=350,
verbose=1,
validation_split=0.1)

#predict the class labels
y_train_pred = model.predict_classes(x_train_1, verbose=0)
correct_preds = np.sum(y_train == y_train_pred, axis=0)
train_acc = correct_preds / y_train.shape[0]
print('Training accuracy: %.2f%%' % (train_acc * 100))

y_test_pred = model.predict_classes(x_test_1, verbose=0)
correct_preds = np.sum(y_test == y_test_pred, axis=0)
test_acc = correct_preds / y_test.shape[0]
print('Test accuracy: %.2f%%' % (test_acc * 100))

print('end')


如果将epoch调的越来越大,最终的效果越好。我将epochs设为50时为98.2%,epochs为100时为98.4%,epochs为150时为98.6%.
可能最终epochs越来越高testaccuracy测试准确率能达到99%


阅读更多
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐