您的位置：首页 > 其它

deeplearning----学习一个简单的分类器

2014-03-05 07:35 375 查看

零一损失

我们的目的就是让错误次数（零一损失）尽可能的少：

f（x）会得出在当前的theata条件下输入对应的最大概率的输出值。换言之，我们从x预测出f(x),如果这个值就是y，那么预测成功，反之失败。

# zero_one_loss is a Theano variable representing a symbolic
# expression of the zero one loss ; to get the actual value this
# symbolic expression has to be compiled into a Theano function (see
# the Theano tutorial for more details)
zero_one_loss = T.sum(T.neq(T.argmax(p_y_given_x), y))
#neq是I函数，T.neq(x,y)判断两个值是否不相等，not equal？

负对数自然损失

由于0-1损失是不可微的,在大型模型中去优化它相当耗费资源,因此我们最大化它的对数似然函数来完成（似然就是可能性）:

也就是最小化负对数似然损失

负对数似然函数：negative log-likelihood (NLL)

# NLL is a symbolic variable ; to get the actual value of NLL, this symbolic
# expression has to be compiled into a Theano function (see the Theano
# tutorial for more details)
NLL = -T.sum(T.log(p_y_given_x)[T.arange(y.shape[0]), y])
# note on syntax: T.arange(y.shape[0]) is a vector of integers [0,1,2,...,len(y)].
# Indexing a matrix M by the two vectors [0,1,...,K], [a,b,...,k] returns the
# elements M[0,a], M[1,b], ..., M[K,k] as a vector.  Here, we use this
# syntax to retrieve the log-probability of the correct labels, y.

随机梯度下降SGD(Stochastic Gradient Descent)

# GRADIENT DESCENT

while True:
loss = f(params)
d_loss_wrt_params = ... # compute gradient
params -= learning_rate * d_loss_wrt_params
if <stopping condition is met>:
return params

上面是一般梯度下降，基本思路是：损失--》梯度--》参数更新

随机梯度下降是一次选几个样本进行训练。最简单的方式是一次一个：

# STOCHASTIC GRADIENT DESCENT
for (x_i,y_i) in training_set:
# imagine an infinite generator
# that may repeat examples (if there is only a finite training set)
loss = f(params, x_i, y_i)
d_loss_wrt_params = ... # compute gradient
params -= learning_rate * d_loss_wrt_params
if <stopping condition is met>:
return params

Minibatch SGD 除了一次使用多个样本，其他和sgd都一样

or (x_batch,y_batch) in train_batches:
# imagine an infinite generator
# that may repeat examples
loss = f(params, x_batch, y_batch)
d_loss_wrt_params = ... # compute gradient using theano
params -= learning_rate * d_loss_wrt_params
if <stopping condition is met>:
return params

上面给出的都是伪代码，完整代码如下：

# Minibatch Stochastic Gradient Descent

# assume loss is a symbolic description of the loss function given
# the symbolic variables params (shared variable), x_batch, y_batch;

# compute gradient of loss with respect to params
d_loss_wrt_params = T.grad(loss, params)

# compile the MSGD step into a theano function
updates = [(params, params - learning_rate * d_loss_wrt_params)]
MSGD = theano.function([x_batch,y_batch], loss, updates=updates)

for (x_batch, y_batch) in train_batches:
# here x_batch and y_batch are elements of train_batches and
# therefore numpy arrays; function MSGD also updates the params
print('Current loss is ', MSGD(x_batch, y_batch))
if stopping_condition_is_met:
return params

正则化

我们希望模型能够用到其他数据上。为防止训练过度的问题（参数变的异常大），我们进行正则化，这里介绍L1/L2正则化，以及提前结束训练的方法

对于我们的问题，可以具体定义为：

其中

观察可发现：当p=1的时候，就是绝对值和；p=2的时候就是根号平方和。

# symbolic Theano variable that represents the L1 regularization term
L1  = T.sum(abs(param))

# symbolic Theano variable that represents the squared L2 term
L2_sqr = T.sum(param ** 2)

# the loss
loss = NLL + lambda_1 * L1 + lambda_2 * L2

提前结束训练

# early-stopping parameters
patience = 5000  # look as this many examples regardless
patience_increase = 2     # wait this much longer when a new best is
# found
improvement_threshold = 0.995  # a relative improvement of this much is
# considered significant
validation_frequency = min(n_train_batches, patience/2)
# go through this many
# minibatches before checking the network
# on the validation set; in this case we
# check every epoch

best_params = None
best_validation_loss = numpy.inf
test_score = 0.
start_time = time.clock()

done_looping = False
epoch = 0
while (epoch < n_epochs) and (not done_looping):
# Report "1" for first epoch, "n_epochs" for last epoch
epoch = epoch + 1
for minibatch_index in xrange(n_train_batches):

d_loss_wrt_params = ... # compute gradient
params -= learning_rate * d_loss_wrt_params # gradient descent

# iteration number. We want it to start at 0.
iter = (epoch - 1) * n_train_batches + minibatch_index
# note that if we do `iter % validation_frequency` it will be
# true for iter = 0 which we do not want. We want it true for
# iter = validation_frequency - 1.
if (iter + 1) % validation_frequency == 0:

this_validation_loss = ... # compute zero-one loss on validation set

if this_validation_loss < best_validation_loss:

# improve patience if loss improvement is good enough
if this_validation_loss < best_validation_loss * improvement_threshold:

patience = max(patience, iter * patience_increase)
best_params = copy.deepcopy(params)
best_validation_loss = this_validation_loss

if patience <= iter:
done_looping = True
break

# POSTCONDITION:
# best_params refers to the best out-of-sample parameters observed during the optimization

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航