您的位置：首页 > 其它

深度学习-实现提高版本的手写数字识别算法

2018-01-26 16:17 411 查看

学习彭亮《深度学习进阶：算法与应用》课程

用不同的初始化权重方法对比

1.对于隐藏层有30个神经元的对比:

（1）之前的方法：N（0，1）

N（0，1），即均值为0，方差为1的标准正太分布

import mnist_loader
training_data, validation_data, test_data =  mnist_loader.load_data_wrapper()
import network2
net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost)
net.large_weight_initializer()
net.SGD(training_data, 30, 10, 0.1,lmbda=5.0, evaluation_data=validation_data,monitor_evaluation_accuracy=True)

（2）新方法：N(0, 1/sqrt(n_in))

import mnist_loader
training_data, validation_data, test_data =  mnist_loader.load_data_wrapper()
import network2
net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost)
# net.large_weight_initializer()  -----少了这句-----
net.SGD(training_data, 30, 10, 0.1,lmbda=5.0, evaluation_data=validation_data,monitor_evaluation_accuracy=True)

结论：

两种方法都得到了高于96%的accuracy

但是新方法更快把精确度提高了 (87 vs 93)

2.对于隐藏层有100个神经元的对比:

结论

从这个例子中看到新的初始化方法只是增快了学习的速率, 最终表现是一样的, 但在有些神经网络中, 新的初始化权重的方法会提高最终的accuracy

实现提高版本的神经网络算法来识别手写数字:

复习之前原始的版本: Network.py

我们从以下方面做了提高:

Cost函数: cross-entropy

Regularization: L1, L2

Softmax layer

初始化 1/sqrt(n_in)

老的Network.py

#coding=utf-8
# @Author: yangenneng
# @Time: 2018-01-23 16:39
# @Abstract：梯度下降算法实现手写数字识别

import numpy as np
import random

# 定义一个神经网络类
class Network(object):
# 功能:构造函数
# sizes: 每层神经元的个数, 例如:net = Network([2, 3, 1])表示 第一层2个神经元,第二层3个神经元:
def __init__(self, sizes):
# 神经网络层数
self.num_layers = len(sizes)
# 每层神经元的个数
self.sizes = sizes
# np.random.rand(y, 1): 随机从正态分布(均值0, 方差1)中生成 ；for y in sizes[1:]：除去第一个数，因为biase是从隐藏层到输出层，输入层没有
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
# net.weights[1] 存储连接第二层和第三层的权重(Python索引从0开始数)  zip是指传入的可循环的两组量产生一组新的量
self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]

# 功能:向前传递神经网络,把输入传进去之后计算输出
def feedforward(self, a):
for b, w in zip(self.biases, self.weights):
# np.dot(w, a)  w向量与a向量点乘 eg:w(w1,w2,w3) a(a1,a2,a3) =>  w1*a1+w2*a2+w3*a3
a = sigmoid(np.dot(w, a) + b)
return a

""""
#  功能:随机梯度下降算法 (stochastic gradient descent)
#  training_data：训练集，是很多tuples (X,Y)list,X是1*784，Y是真实结果
#  epochs：训练多少轮
#  mini_batch_size：每一轮的数据有多少个实例
#  eta：学习率
#  test_data：测试集
"""
def SGD(self, training_data, epochs, mini_batch_size, eta,test_data=None):
# 如果存在test_data，则返回ture,执行n_test = len(test_data)，即计算测试集的行数是多少
if test_data: n_test = len(test_data)
# 训练集有多少个tuple，每个tuple对应一个训练集的图片
n = len(training_data)
# 循环训练epochs轮，j为当前第几轮
for j in xrange(epochs):
# shuffle洗牌，即随机打乱训练集
random.shuffle(training_data)
# 取每轮的测试集
mini_batches = [
# 从0-n中每次间隔mini_batch_size取值
training_data[k:k + mini_batch_size]
for k in xrange(0, n, mini_batch_size)]
# 对每一个mini_batch进行更新
for mini_batch in mini_batches:
# 更新weight和baise eta为学习率
self.update_mini_batch(mini_batch, eta)
# 如果传递了测试集，评估一下当前的准确率
if test_data:
# ，如果有测试集，第j轮，测试集的预测准确性
print "Epoch {0}: {1} / {2}".format(j, self.evaluate(test_data), n_test)
else:
# 第j轮结束
print "Epoch {0} complete".format(j)

# 更新权重和偏向
def update_mini_batch(self, mini_batch, eta):
# 初始化baise
nabla_b = [np.zeros(b.shape) for b in self.biases]
# 初始化weights
nabla_w = [np.zeros(w.shape) for w in self.weights]
# x:1*784 y:10*1
for x, y in mini_batch:
# Backpropagation算法计算目标函数权重和偏向的偏导数
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
# 累积起所有的权重和偏向
nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
#  更新权重和偏向
self.weights = [w - (eta / len(mini_batch)) * nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b - (eta / len(mini_batch)) * nb
for b, nb in zip(self.biases, nabla_b)]

# Backpropagation算法计算cost function对biase和weight的偏导数
#  x：784*1
#  y：10*1
def backprop(self, x, y):
# 初始化两个矩阵
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# 正向传递，输入 x: 设置输入层activation a
activation = x
activations = [x]  # 转化成list
zs = []  # z为中间变量
for b, w in zip(self.biases, self.weights):
# 中间变量Z就是点乘之和+偏向
z = np.dot(w, activation) + b
zs.append(z)
# 计算activation
activation = sigmoid(z)
activations.append(activation)

# 反向更新
# 计算出输出层error:   activations[-1]表示list中最后一层的activation值，即输出层的
delta = self.cost_derivative(activations[-1], y) * \
sigmoid_prime(zs[-1])
# 更新输出层
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())

# 从输出层往回更新，用的时候都是-L
for l in xrange(2, self.num_layers):
# z=zs最后一层的值
z = zs[-l]
# sigmoid求导数
sp = sigmoid_prime(z)
# 更新delta
delta = np.dot(self.weights[-l + 1].transpose(), delta) * sp
# 算出偏导
nabla_b[-l] = delta
nabla_w[-l] = np.dot(delta, activations[-l - 1].transpose())
return (nabla_b, nabla_w)

# 每一步训练完后的准确率更新
def evaluate(self, test_data):
test_results = [(np.argmax(self.feedforward(x)), y)
for (x, y) in test_data]
# 整体输出的y和真实的有多少是相等的，即正确识别了多少个数字图片
return sum(int(x == y) for (x, y) in test_results)

# 求偏导的方程
def cost_derivative(self, output_activations, y):
return (output_activations - y)

# Sigmoid函数,即f(x)=1/(1+e-x).神经元的非线性作用函数.
def sigmoid(z):
return 1.0/(1.0+np.exp(-z))

# 对Sigmoid函数求一阶导
def sigmoid_prime(z):
return sigmoid(z)*(1-sigmoid(z))

net = Network([2, 3, 1])
# print "net.num_layers:",net.num_layers
# print "\nnet.sizes:",net.sizes
# print "\nnet.biases:",net.biases
# print "\nnet.weights:",net.weights

新的Network2.py

"""
network2.py
~~~~~~~~~~~~~~
An improved version of network.py, implementing the stochastic
gradient descent learning algorithm for a feedforward neural network.
Improvements include the addition of the cross-entropy cost function,
regularization, and better initialization of network weights.  Note
that I have focused on making the code simple, easily readable, and
easily modifiable.  It is not optimized, and omits many desirable
features.
"""

#### Libraries
# Standard library
import json
import random
import sys

# Third-party libraries
import numpy as np

#### Define the quadratic and cross-entropy cost functions

class QuadraticCost(object):

@staticmethod
def fn(a, y):
"""Return the cost associated with an output ``a`` and desired output
``y``.
"""
return 0.5*np.linalg.norm(a-y)**2

@staticmethod
def delta(z, a, y):
"""Return the error delta from the output layer."""
return (a-y) * sigmoid_prime(z)

class CrossEntropyCost(object):

@staticmethod
def fn(a, y):
"""Return the cost associated with an output ``a`` and desired output
``y``.  Note that np.nan_to_num is used to ensure numerical
stability.  In particular, if both ``a`` and ``y`` have a 1.0
in the same slot, then the expression (1-y)*np.log(1-a)
returns nan.  The np.nan_to_num ensures that that is converted
to the correct value (0.0).
"""
return np.sum(np.nan_to_num(-y*np.log(a)-(1-y)*np.log(1-a)))

@staticmethod
def delta(z, a, y):
"""Return the error delta from the output layer.  Note that the
parameter ``z`` is not used by the method.  It is included in
the method's parameters in order to make the interface
consistent with the delta method for other cost classes.
"""
return (a-y)

#### Main Network class
class Network(object):

def __init__(self, sizes, cost=CrossEntropyCost):
"""The list ``sizes`` contains the number of neurons in the respective
layers of the network.  For example, if the list was [2, 3, 1]
then it would be a three-layer network, with the first layer
containing 2 neurons, the second layer 3 neurons, and the
third layer 1 neuron.  The biases and weights for the network
are initialized randomly, using
``self.default_weight_initializer`` (see docstring for that
method).
"""
self.num_layers = len(sizes)
self.sizes = sizes
self.default_weight_initializer()
self.cost=cost

def default_weight_initializer(self):
"""Initialize each weight using a Gaussian distribution with mean 0
and standard deviation 1 over the square root of the number of
weights connecting to the same neuron.  Initialize the biases
using a Gaussian distribution with mean 0 and standard
deviation 1.
Note that the first layer is assumed to be an input layer, and
by convention we won't set any biases for those neurons, since
biases are only ever used in computing the outputs from later
layers.
"""
self.biases = [np.random.randn(y, 1) for y in self.sizes[1:]]
self.weights = [np.random.randn(y, x)/np.sqrt(x)
for x, y in zip(self.sizes[:-1], self.sizes[1:])]

def large_weight_initializer(self):
"""Initialize the weights using a Gaussian distribution with mean 0
and standard deviation 1.  Initialize the biases using a
Gaussian distribution with mean 0 and standard deviation 1.
Note that the first layer is assumed to be an input layer, and
by convention we won't set any biases for those neurons, since
biases are only ever used in computing the outputs from later
layers.
This weight and bias initializer uses the same approach as in
Chapter 1, and is included for purposes of comparison.  It
will usually be better to use the default weight initializer
instead.
"""
self.biases = [np.random.randn(y, 1) for y in self.sizes[1:]]
self.weights = [np.random.randn(y, x)
for x, y in zip(self.sizes[:-1], self.sizes[1:])]

def feedforward(self, a):
"""Return the output of the network if ``a`` is input."""
for b, w in zip(self.biases, self.weights):
a = sigmoid(np.dot(w, a)+b)
return a

def SGD(self, training_data, epochs, mini_batch_size, eta,
lmbda = 0.0,
evaluation_data=None,
monitor_evaluation_cost=False,
monitor_evaluation_accuracy=False,
monitor_training_cost=False,
monitor_training_accuracy=False):
"""Train the neural network using mini-batch stochastic gradient
descent.  The ``training_data`` is a list of tuples ``(x, y)``
representing the training inputs and the desired outputs.  The
other non-optional parameters are self-explanatory, as is the
regularization parameter ``lmbda``.  The method also accepts
``evaluation_data``, usually either the validation or test
data.  We can monitor the cost and accuracy on either the
evaluation data or the training data, by setting the
appropriate flags.  The method returns a tuple containing four
lists: the (per-epoch) costs on the evaluation data, the
accuracies on the evaluation data, the costs on the training
data, and the accuracies on the training data.  All values are
evaluated at the end of each training epoch.  So, for example,
if we train for 30 epochs, then the first element of the tuple
will be a 30-element list containing the cost on the
evaluation data at the end of each epoch. Note that the lists
are empty if the corresponding flag is not set.
"""
if evaluation_data: n_data = len(evaluation_data)
n = len(training_data)
evaluation_cost, evaluation_accuracy = [], []
training_cost, training_accuracy = [], []
for j in xrange(epochs):
random.shuffle(training_data)
mini_batches = [
training_data[k:k+mini_batch_size]
for k in xrange(0, n, mini_batch_size)]
for mini_batch in mini_batches:
self.update_mini_batch(
mini_batch, eta, lmbda, len(training_data))
print "Epoch %s training complete" % j
if monitor_training_cost:
cost = self.total_cost(training_data, lmbda)
training_cost.append(cost)
print "Cost on training data: {}".format(cost)
if monitor_training_accuracy:
accuracy = self.accuracy(training_data, convert=True)
training_accuracy.append(accuracy)
print "Accuracy on training data: {} / {}".format(
accuracy, n)
if monitor_evaluation_cost:
cost = self.total_cost(evaluation_data, lmbda, convert=True)
evaluation_cost.append(cost)
print "Cost on evaluation data: {}".format(cost)
if monitor_evaluation_accuracy:
accuracy = self.accuracy(evaluation_data)
evaluation_accuracy.append(accuracy)
print "Accuracy on evaluation data: {} / {}".format(
self.accuracy(evaluation_data), n_data)
print
return evaluation_cost, evaluation_accuracy, \
training_cost, training_accuracy

def update_mini_batch(self, mini_batch, eta, lmbda, n):
"""Update the network's weights and biases by applying gradient
descent using backpropagation to a single mini batch.  The
``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the
learning rate, ``lmbda`` is the regularization parameter, and
``n`` is the total size of the training data set.
"""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]

def backprop(self, x, y):
"""Return a tuple ``(nabla_b, nabla_w)`` representing the
gradient for the cost function C_x.  ``nabla_b`` and
``nabla_w`` are layer-by-layer lists of numpy arrays, similar
to ``self.biases`` and ``self.weights``."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# feedforward
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation)+b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# backward pass
delta = (self.cost).delta(zs[-1], activations[-1], y)
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
# Note that the variable l in the loop below is used a little
# differently to the notation in Chapter 2 of the book.  Here,
# l = 1 means the last layer of neurons, l = 2 is the
# second-last layer, and so on.  It's a renumbering of the
# scheme in the book, used here to take advantage of the fact
# that Python can use negative indices in lists.
for l in xrange(2, self.num_layers):
z = zs[-l]
sp = sigmoid_prime(z)
delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
nabla_b[-l] = delta
nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
return (nabla_b, nabla_w)

def accuracy(self, data, convert=False):
"""Return the number of inputs in ``data`` for which the neural
network outputs the correct result. The neural network's
output is assumed to be the index of whichever neuron in the
final layer has the highest activation.
The flag ``convert`` should be set to False if the data set is
validation or test data (the usual case), and to True if the
data set is the training data. The need for this flag arises
due to differences in the way the results ``y`` are
represented in the different data sets.  In particular, it
flags whether we need to convert between the different
representations.  It may seem strange to use different
representations for the different data sets.  Why not use the
same representation for all three data sets?  It's done for
efficiency reasons -- the program usually evaluates the cost
on the training data and the accuracy on other data sets.
These are different types of computations, and using different
representations speeds things up.  More details on the
representations can be found in
mnist_loader.load_data_wrapper.
"""
if convert:
results = [(np.argmax(self.feedforward(x)), np.argmax(y))
for (x, y) in data]
else:
results = [(np.argmax(self.feedforward(x)), y)
for (x, y) in data]
return sum(int(x == y) for (x, y) in results)

def total_cost(self, data, lmbda, convert=False):
"""Return the total cost for the data set ``data``.  The flag
``convert`` should be set to False if the data set is the
training data (the usual case), and to True if the data set is
the validation or test data.  See comments on the similar (but
reversed) convention for the ``accuracy`` method, above.
"""
cost = 0.0
for x, y in data:
a = self.feedforward(x)
if convert: y = vectorized_result(y)
cost += self.cost.fn(a, y)/len(data)
cost += 0.5*(lmbda/len(data))*sum(
np.linalg.norm(w)**2 for w in self.weights)
return cost

def save(self, filename):
"""Save the neural network to the file ``filename``."""
data = {"sizes": self.sizes,
"weights": [w.tolist() for w in self.weights],
"biases": [b.tolist() for b in self.biases],
"cost": str(self.cost.__name__)}
f = open(filename, "w")
json.dump(data, f)
f.close()

#### Loading a Network
def load(filename):
"""Load a neural network from the file ``filename``.  Returns an
instance of Network.
"""
f = open(filename, "r")
data = json.load(f)
f.close()
cost = getattr(sys.modules[__name__], data["cost"])
net = Network(data["sizes"], cost=cost)
net.weights = [np.array(w) for w in data["weights"]]
net.biases = [np.array(b) for b in data["biases"]]
return net

#### Miscellaneous functions
def vectorized_result(j):
"""Return a 10-dimensional unit vector with a 1.0 in the j'th position
and zeroes elsewhere.  This is used to convert a digit (0...9)
into a corresponding desired output from the neural network.
"""
e = np.zeros((10, 1))
e[j] = 1.0
return e

def sigmoid(z):
"""The sigmoid function."""
return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
"""Derivative of the sigmoid function."""
return sigmoid(z)*(1-sigmoid(z))

对比初始化的不同方法：

#老方法
def large_weight_initializer(self):
self.biases = [np.random.randn(y, 1) for y in self.sizes[1:]]
self.weights = [np.random.randn(y, x)
for x, y in zip(self.sizes[:-1], self.sizes[1:])]
#默认方法，即新方法
def default_weight_initializer(self):
self.biases = [np.random.randn(y, 1) for y in self.sizes[1:]]
self.weights = [np.random.randn(y, x)/np.sqrt(x)
for x, y in zip(self.sizes[:-1], self.sizes[1:])]

对于Cost函数:

老的cost function(老的二次cost)

class QuadraticCost(object):

@staticmethod
def fn(a, y):
return 0.5*np.linalg.norm(a-y)**2

@staticmethod
def delta(z, a, y):
return (a-y) * sigmoid_prime(z)

新的cost function

class CrossEntropyCost(object):

@staticmethod
def fn(a, y):
return np.sum(np.nan_to_num(-y*np.log(a)-(1-y)*np.log(1-a)))

@staticmethod
def delta(z, a, y):
return (a-y)

为什么把cost实现在一个类里面而不是一个function?

计算cost有两个作用:

1. 衡量网络输出的值和理想预期值的匹配程度

2. 在用backprogapation计算偏导数的时候, 需要计算

运行network2.py测试

#coding=utf-8
# @Author: yangenneng
# @Time: 2018-01-26 14:10
# @Abstract：

import mnist_loader
training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
import network2
net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost)
# net.large_weight_initializer()
net.SGD(training_data, 30, 10, 0.5,
evaluation_data=validation_data, lmbda = 5.0,
monitor_evaluation_cost=True,
monitor_evaluation_accuracy=True,
monitor_training_cost=True,
monitor_training_accuracy=True)

准确率最高能达到：96.32%，相同参数下比之前的network.py要高

F:\python-2.7.13.amd64\Anaconda2-4.2.0-Windows-x86_64\python.exe D:/Python/PyCharm-WorkSpace/DeepLearning_Advanced/GradientDescent/regularization.py
Epoch 0 training complete
Cost on training data: 0.472008148311
Accuracy on training data: 47147 / 50000
Cost on evaluation data: 0.789319047524
Accuracy on evaluation data: 9441 / 10000

Epoch 1 training complete
Cost on training data: 0.453797259886
Accuracy on training data: 47422 / 50000
Cost on evaluation data: 0.868820923816
Accuracy on evaluation data: 9488 / 10000

Epoch 2 training complete
Cost on training data: 0.426263459796
Accuracy on training data: 47697 / 50000
Cost on evaluation data: 0.892428620085
Accuracy on evaluation data: 9518 / 10000

Epoch 3 training complete
Cost on training data: 0.397153218406
Accuracy on training data: 48015 / 50000
Cost on evaluation data: 0.902996376529
Accuracy on evaluation data: 9541 / 10000

Epoch 4 training complete
Cost on training data: 0.404934334043
Accuracy on training data: 48036 / 50000
Cost on evaluation data: 0.925197703694
Accuracy on evaluation data: 9556 / 10000

Epoch 5 training complete
Cost on training data: 0.40452328887
Accuracy on training data: 48058 / 50000
Cost on evaluation data: 0.931829691071
Accuracy on evaluation data: 9552 / 10000

Epoch 6 training complete
Cost on training data: 0.380791629252
Accuracy on training data: 48299 / 50000
Cost on evaluation data: 0.930382092998
Accuracy on evaluation data: 9584 / 10000

Epoch 7 training complete
Cost on training data: 0.42682006464
Accuracy on training data: 47853 / 50000
Cost on evaluation data: 0.973889340064
Accuracy on evaluation data: 9534 / 10000

Epoch 8 training complete
Cost on training data: 0.381931709493
Accuracy on training data: 48297 / 50000
Cost on evaluation data: 0.943710722991
Accuracy on evaluation data: 9604 / 10000

Epoch 9 training complete
Cost on training data: 0.394124099343
Accuracy on training data: 48116 / 50000
Cost on evaluation data: 0.960483778054
Accuracy on evaluation data: 9558 / 10000

Epoch 10 training complete
Cost on training data: 0.377963935828
Accuracy on training data: 48358 / 50000
Cost on evaluation data: 0.947445697355
Accuracy on evaluation data: 9570 / 10000

Epoch 11 training complete
Cost on training data: 0.395500020434
Accuracy on training data: 48210 / 50000
Cost on evaluation data: 0.966927012594
Accuracy on evaluation data: 9567 / 10000

Epoch 12 training complete
Cost on training data: 0.39226235435
Accuracy on training data: 48261 / 50000
Cost on evaluation data: 0.968052489284
Accuracy on evaluation data: 9567 / 10000

Epoch 13 training complete
Cost on training data: 0.368915410634
Accuracy on training data: 48463 / 50000
Cost on evaluation data: 0.950452371475
Accuracy on evaluation data: 9619 / 10000

Epoch 14 training complete
Cost on training data: 0.424900909029
Accuracy on training data: 47966 / 50000
Cost on evaluation data: 0.99555005664
Accuracy on evaluation data: 9521 / 10000

Epoch 15 training complete
Cost on training data: 0.363736970943
Accuracy on training data: 48457 / 50000
Cost on evaluation data: 0.937241548632
Accuracy on evaluation data: 9623 / 10000

Epoch 16 training complete
Cost on training data: 0.430628036017
Accuracy on training data: 47885 / 50000
Cost on evaluation data: 1.00785325697
Accuracy on evaluation data: 9503 / 10000

Epoch 17 training complete
Cost on training data: 0.371276835138
Accuracy on training data: 48419 / 50000
Cost on evaluation data: 0.947896956145
Accuracy on evaluation data: 9610 / 10000

Epoch 18 training complete
Cost on training data: 0.389298019413
Accuracy on training data: 48308 / 50000
Cost on evaluation data: 0.962151955008
Accuracy on evaluation data: 9592 / 10000

Epoch 19 training complete
Cost on training data: 0.369310821264
Accuracy on training data: 48415 / 50000
Cost on evaluation data: 0.948146748962
Accuracy on evaluation data: 9614 / 10000

Epoch 20 training complete
Cost on training data: 0.396054214492
Accuracy on training data: 48253 / 50000
Cost on evaluation data: 0.981207740108
Accuracy on evaluation data: 9588 / 10000

Epoch 21 training complete
Cost on training data: 0.370809138665
Accuracy on training data: 48523 / 50000
Cost on evaluation data: 0.955209722303
Accuracy on evaluation data: 9585 / 10000

Epoch 22 training complete
Cost on training data: 0.393853695598
Accuracy on training data: 48281 / 50000
Cost on evaluation data: 0.975714066174
Accuracy on evaluation data: 9575 / 10000

Epoch 23 training complete
Cost on training data: 0.36334991212
Accuracy on training data: 48452 / 50000
Cost on evaluation data: 0.945567032029
Accuracy on evaluation data: 9612 / 10000

Epoch 24 training complete
Cost on training data: 0.367186324806
Accuracy on training data: 48475 / 50000
Cost on evaluation data: 0.940873719939
Accuracy on evaluation data: 9618 / 10000

Epoch 25 training complete
Cost on training data: 0.362830867428
Accuracy on training data: 48494 / 50000
Cost on evaluation data: 0.936882103762
Accuracy on evaluation data: 9632 / 10000

Epoch 26 training complete
Cost on training data: 0.364483067411
Accuracy on training data: 48488 / 50000
Cost on evaluation data: 0.945528985147
Accuracy on evaluation data: 9607 / 10000

Epoch 27 training complete
Cost on training data: 0.385661385381
Accuracy on training data: 48277 / 50000
Cost on evaluation data: 0.966490614403
Accuracy on evaluation data: 9562 / 10000

Epoch 28 training complete
Cost on training data: 0.361879049238
Accuracy on training data: 48503 / 50000
Cost on evaluation data: 0.946364445193
Accuracy on evaluation data: 9611 / 10000

Epoch 29 training complete
Cost on training data: 0.359157709887
Accuracy on training data: 48529 / 50000
Cost on evaluation data: 0.943491269513
Accuracy on evaluation data: 9625 / 10000

Process finished with exit code 0

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航