您的位置:首页 > 其它

深度学习 —— 逻辑回归

2017-05-31 20:37 127 查看
用逻辑回归对MNIST数字分类

本章我们将展示如何使用Theano进行最基本的分类:逻辑回归。我们从快速构建一个模型原型开始,一来回顾一下之前的知识,二来明确标记并展示如何使用Theano的图来表现数学公式。

依据机器学习的传统,我们从MNIST数字分类开始。

模型

逻辑回归是一个线形概率分类器。参数包括权重矩阵W和偏差向量b。分类通过将输入向量映射到一组超平面实现,每一个超平面对应一个类。输入向量与超平面之间的距离代表了该输入属于对应类的概率。

数学表达上,输入向量x是i类的概率,随机变量Y,可以写成:



模型预测值y是属于该类的最大概率,表达为:



Theano代码如下:

#initializewith0theweightsWasamatrixofshape(n_in,n_out)
self.W=theano.shared(
value=numpy.zeros(
(n_in,n_out),
dtype=theano.config.floatX
),
name='W',
borrow=True
)
#initializethebiasesbasavectorofn_out0s
self.b=theano.shared(
value=numpy.zeros(
(n_out,),
dtype=theano.config.floatX
),
name='b',
borrow=True
)

#symbolicexpressionforcomputingthematrixofclass-membership
#probabilities
#Where:
#Wisamatrixwherecolumn-krepresenttheseparationhyperplanefor
#class-k
#xisamatrixwhererow-jrepresentsinputtrainingsample-j
#bisavectorwhereelement-krepresentthefreeparameterof
#hyperplane-k
self.p_y_given_x=T.nnet.softmax(T.dot(input,self.W)+self.b)

#symbolicdescriptionofhowtocomputepredictionasclasswhose
#probabilityismaximal
self.y_pred=T.argmax(self.p_y_given_x,axis=1)


考虑到训练过程中模型参数要固定,我们为W,b分配共享变量。通过这种方式将他们声明为Theano象征变量同时初始化内容。然后通过点乘和softmax操作计算P(Y|x,W,b)向量。计算结果p_y_given_x是向量类型的象征变量。

我们使用T.argmax操作,返回p_y_given_x最大概率类的索引,从而得到实际预测值。

以下章节我们具体展现如何实现参数优化。

定义损失函数
学习模型最优参数需要最小化损失函数。对于多类的逻辑回归,一般使用负指数相似作为损失。首先将相似L与损失l定义为:


尽管我们所有的主题都围绕最小化损失展开,目前为止梯度下降是最小化任意非线性函数损失最简单的方式。本教程使用随机梯度下降与微批次(MSGD),详见Getting
Started-DeepLearning0.1documentation

下列代码定义了给定微批次的损失:

#y.shape[0]is(symbolically)thenumberofrowsiny,i.e.,
#numberofexamples(callitn)intheminibatch
#T.arange(y.shape[0])isasymbolicvectorwhichwillcontain
#[0,1,2,...n-1]T.log(self.p_y_given_x)isamatrixof
#Log-Probabilities(callitLP)withonerowperexampleand
#onecolumnperclassLP[T.arange(y.shape[0]),y]isavector
#vcontaining[LP[0,y[0]],LP[1,y[1]],LP[2,y[2]],...,
#LP[n-1,y[n-1]]]andT.mean(LP[T.arange(y.shape[0]),y])is
#themean(acrossminibatchexamples)oftheelementsinv,
#i.e.,themeanlog-likelihoodacrosstheminibatch.
return-T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]),y])


注意:尽管损失定义为总和数,但实践中我们使用平均数T.mean,这样做的好处是减少学习速率对于微批次大小的依赖。

构建一个逻辑回归类

我们现在可以构建封装逻辑回归基本功能的LogisticRegression类了。

下列代码展现了之前介绍的内容:

classLogisticRegression(object):
"""Multi-classLogisticRegressionClass

Thelogisticregressionisfullydescribedbyaweightmatrix:math:`W`
andbiasvector:math:`b`.Classificationisdonebyprojectingdata
pointsontoasetofhyperplanes,thedistancetowhichisusedto
determineaclassmembershipprobability.
"""

def__init__(self,input,n_in,n_out):
"""Initializetheparametersofthelogisticregression

:typeinput:theano.tensor.TensorType
:paraminput:symbolicvariablethatdescribestheinputofthe
architecture(oneminibatch)

:typen_in:int
:paramn_in:numberofinputunits,thedimensionofthespacein
whichthedatapointslie

:typen_out:int
:paramn_out:numberofoutputunits,thedimensionofthespacein
whichthelabelslie

"""
#start-snippet-1
#initializewith0theweightsWasamatrixofshape(n_in,n_out)
self.W=theano.shared(
value=numpy.zeros(
(n_in,n_out),
dtype=theano.config.floatX
),
name='W',
borrow=True
)
#initializethebiasesbasavectorofn_out0s
self.b=theano.shared(
value=numpy.zeros(
(n_out,),
dtype=theano.config.floatX
),
name='b',
borrow=True
)

#symbolicexpressionforcomputingthematrixofclass-membership
#probabilities
#Where:
#Wisamatrixwherecolumn-krepresenttheseparationhyperplanefor
#class-k
#xisamatrixwhererow-jrepresentsinputtrainingsample-j
#bisavectorwhereelement-krepresentthefreeparameterof
#hyperplane-k
self.p_y_given_x=T.nnet.softmax(T.dot(input,self.W)+self.b)

#symbolicdescriptionofhowtocomputepredictionasclasswhose
#probabilityismaximal
self.y_pred=T.argmax(self.p_y_given_x,axis=1)
#end-snippet-1

#parametersofthemodel
self.params=[self.W,self.b]

#keeptrackofmodelinput
self.input=input

defnegative_log_likelihood(self,y):
"""Returnthemeanofthenegativelog-likelihoodoftheprediction
ofthismodelunderagiventargetdistribution.

..math::

\frac{1}{|\mathcal{D}|}\mathcal{L}(\theta=\{W,b\},\mathcal{D})=
\frac{1}{|\mathcal{D}|}\sum_{i=0}^{|\mathcal{D}|}
\log(P(Y=y^{(i)}|x^{(i)},W,b))\\
\ell(\theta=\{W,b\},\mathcal{D})

:typey:theano.tensor.TensorType
:paramy:correspondstoavectorthatgivesforeachexamplethe
correctlabel

Note:weusethemeaninsteadofthesumsothat
thelearningrateislessdependentonthebatchsize
"""
#start-snippet-2
#y.shape[0]is(symbolically)thenumberofrowsiny,i.e.,
#numberofexamples(callitn)intheminibatch
#T.arange(y.shape[0])isasymbolicvectorwhichwillcontain
#[0,1,2,...n-1]T.log(self.p_y_given_x)isamatrixof
#Log-Probabilities(callitLP)withonerowperexampleand
#onecolumnperclassLP[T.arange(y.shape[0]),y]isavector
#vcontaining[LP[0,y[0]],LP[1,y[1]],LP[2,y[2]],...,
#LP[n-1,y[n-1]]]andT.mean(LP[T.arange(y.shape[0]),y])is
#themean(acrossminibatchexamples)oftheelementsinv,
#i.e.,themeanlog-likelihoodacrosstheminibatch.
return-T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]),y])
#end-snippet-2

deferrors(self,y):
"""Returnafloatrepresentingthenumberoferrorsintheminibatch
overthetotalnumberofexamplesoftheminibatch;zeroone
lossoverthesizeoftheminibatch

:typey:theano.tensor.TensorType
:paramy:correspondstoavectorthatgivesforeachexamplethe
correctlabel
"""

#checkifyhassamedimensionofy_pred
ify.ndim!=self.y_pred.ndim:

114c4
raiseTypeError(
'yshouldhavethesameshapeasself.y_pred',
('y',y.type,'y_pred',self.y_pred.type)
)
#checkifyisofthecorrectdatatype
ify.dtype.startswith('int'):
#theT.neqoperatorreturnsavectorof0sand1s,where1
#representsamistakeinprediction
returnT.mean(T.neq(self.y_pred,y))
else:
raiseNotImplementedError()


我们通过以下代码构建一个实例:

#generatesymbolicvariablesforinput(xandyrepresenta
#minibatch)
x=T.matrix('x')#data,presentedasrasterizedimages
y=T.ivector('y')#labels,presentedas1Dvectorof[int]labels

#constructthelogisticregressionclass
#EachMNISTimagehassize28*28
classifier=LogisticRegression(input=x,n_in=28*28,n_out=10)


我们从为训练输入x和相应的类y分配象征变量开始。注意x和y是定义在LogisticRegression对象范围之外。考虑到类将输入构建成图,因此它作为__init__函数的参数传入。这在你想将该类下多个实例连接构建深层网络时十分有用。一层的输出可以作为上一层的输入传递。最后,我们通过实例化classifier.negtive_log_likelihood方法来定义cost变量:

#thecostweminimizeduringtrainingisthenegativeloglikelihoodof
#themodelinsymbolicformat
cost=classifier.negative_log_likelihood(y)


注意象征变量classifier在初始时是依据x定义,因此x在定义cost时是作为隐含象征输入存在的。

学习模型
大多数语言如C/C++,Matlab,Pyhon要想实现MSGD一般从手动计算参数的梯度损失开始,对于复杂模型这项工作不太容易,特别是在考虑到计算稳定性问题的情况下。

使用Thean可使这项工作大大简化,它自动计算微分,并使用数学转化来提高计算稳定性。

计算梯度微分:

g_W=T.grad(cost=cost,wrt=classifier.W)
g_b=T.grad(cost=cost,wrt=classifier.b)


g_w和g_b是象征变量,可以用于图计算。函数train_model,执行一步梯度下降可定义为:

#specifyhowtoupdatetheparametersofthemodelasalistof
#(variable,updateexpression)pairs.
updates=[(classifier.W,classifier.W-learning_rate*g_W),
(classifier.b,classifier.b-learning_rate*g_b)]

#compilingaTheanofunction`train_model`thatreturnsthecost,butin
#thesametimeupdatestheparameterofthemodelbasedontherules
#definedin`updates`
train_model=theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x:train_set_x[index*batch_size:(index+1)*batch_size],
y:train_set_y[index*batch_size:(index+1)*batch_size]
}
)


update是一个对列表。在每一对中,第一个元素是本步中需要更新的象征变量,第二个元素是计算新值的象征函数。相似的,given是一个字典,定义了象征变量和本步中的替代值。train_model定义为:

输入是由mini-batch索引index和批次规模来定义x和相应的标签y

返回值是由index定义的与x,y相关的cost/loss

每次执行函数时,x,y由index指定部分的训练集替换。然后评估与minibatch相关的cost,并进行由updates定义的操作。

每次执行train_model(index),都会计算并返回minibatch的cost,并执行MSGD。学习函数以minibatch方式、通过反复执行train_model函数遍历训练集的全部样本。

测试模型

如之前学习分类器所述,测试模型时我们关注错分类样本的数量。LogisticRegression类有一个额外方式为提取每个微批次的错分类样本数构建象征图。

代码如下:

deferrors(self,y):
"""Returnafloatrepresentingthenumberoferrorsintheminibatch
overthetotalnumberofexamplesoftheminibatch;zeroone
lossoverthesizeoftheminibatch

:typey:theano.tensor.TensorType
:paramy:correspondstoavectorthatgivesforeachexamplethe
correctlabel
"""

#checkifyhassamedimensionofy_pred
ify.ndim!=self.y_pred.ndim:
raiseTypeError(
'yshouldhavethesameshapeasself.y_pred',
('y',y.type,'y_pred',self.y_pred.type)
)
#checkifyisofthecorrectdatatype
ify.dtype.startswith('int'):
#theT.neqoperatorreturnsavectorof0sand1s,where1
#representsamistakeinprediction
returnT.mean(T.neq(self.y_pred,y))
else:
raiseNotImplementedError()


然后我们构建test_model和validate_model函数。稍后将见到validate_model是实施提前停止的关键。两个函数都通过提取微批次来计算错分类数,不同的是一个从测试集提取,另一个从验证集提取。

#compilingaTheanofunctionthatcomputesthemistakesthataremadeby
#themodelonaminibatch
test_model=theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x:test_set_x[index*batch_size:(index+1)*batch_size],
y:test_set_y[index*batch_size:(index+1)*batch_size]
}
)

validate_model=theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x:valid_set_x[index*batch_size:(index+1)*batch_size],
y:valid_set_y[index*batch_size:(index+1)*batch_size]
}
)


用Theano对MNIST进行训练的完整代码

"""
ThistutorialintroduceslogisticregressionusingTheanoandstochastic
gradientdescent.

Logisticregressionisaprobabilistic,linearclassifier.Itisparametrized
byaweightmatrix:math:`W`andabiasvector:math:`b`.Classificationis
donebyprojectingdatapointsontoasetofhyperplanes,thedistanceto
whichisusedtodetermineaclassmembershipprobability.

Mathematically,thiscanbewrittenas:

..math::
P(Y=i|x,W,b)&=softmax_i(Wx+b)\\
&=\frac{e^{W_ix+b_i}}{\sum_je^{W_jx+b_j}}

Theoutputofthemodelorpredictionisthendonebytakingtheargmaxof
thevectorwhosei'thelementisP(Y=i|x).

..math::

y_{pred}=argmax_iP(Y=i|x,W,b)

Thistutorialpresentsastochasticgradientdescentoptimizationmethod
suitableforlargedatasets.

References:

-textbooks:"PatternRecognitionandMachineLearning"-
ChristopherM.Bishop,section4.3.2

"""

from__future__importprint_function

__docformat__='restructedtexten'

importsix.moves.cPickleaspickle
importgzip
importos
importsys
importtimeit

importnumpy

importtheano
importtheano.tensorasT

classLogisticRegression(object):
"""Multi-classLogisticRegressionClass

Thelogisticregressionisfullydescribedbyaweightmatrix:math:`W`
andbiasvector:math:`b`.Classificationisdonebyprojectingdata
pointsontoasetofhyperplanes,thedistancetowhichisusedto
determineaclassmembershipprobability.
"""

def__init__(self,input,n_in,n_out):
"""Initializetheparametersofthelogisticregression

:typeinput:theano.tensor.TensorType
:paraminput:symbolicvariablethatdescribestheinputofthe
architecture(oneminibatch)

:typen_in:int
:paramn_in:numberofinputunits,thedimensionofthespacein
whichthedatapointslie

:typen_out:int
:paramn_out:numberofoutputunits,thedimensionofthespacein
whichthelabelslie

"""
#start-snippet-1
#initializewith0theweightsWasamatrixofshape(n_in,n_out)
self.W=theano.shared(
value=numpy.zeros(
(n_in,n_out),
dtype=theano.config.floatX
),
name='W',
borrow=True
)
#initializethebiasesbasavectorofn_out0s
self.b=theano.shared(
value=numpy.zeros(
(n_out,),
dtype=theano.config.floatX
),
name='b',
borrow=True
)

#symbolicexpressionforcomputingthematrixofclass-membership
#probabilities
#Where:
#Wisamatrixwherecolumn-krepresenttheseparationhyperplanefor
#class-k
#xisamatrixwhererow-jrepresentsinputtrainingsample-j
#bisavectorwhereelement-krepresentthefreeparameterof
#hyperplane-k
self.p_y_given_x=T.nnet.softmax(T.dot(input,self.W)+self.b)

#symbolicdescriptionofhowtocomputepredictionasclasswhose
#probabilityismaximal
self.y_pred=T.argmax(self.p_y_given_x,axis=1)
#end-snippet-1

#parametersofthemodel
self.params=[self.W,self.b]

#keeptrackofmodelinput
self.input=input

defnegative_log_likelihood(self,y):
"""Returnthemeanofthenegativelog-likelihoodoftheprediction
ofthismodelunderagiventargetdistribution.

..math::

\frac{1}{|\mathcal{D}|}\mathcal{L}(\theta=\{W,b\},\mathcal{D})=
\frac{1}{|\mathcal{D}|}\sum_{i=0}^{|\mathcal{D}|}
\log(P(Y=y^{(i)}|x^{(i)},W,b))\\
\ell(\theta=\{W,b\},\mathcal{D})

:typey:theano.tensor.TensorType
:paramy:correspondstoavectorthatgivesforeachexamplethe
correctlabel

Note:weusethemeaninsteadofthesumsothat
thelearningrateislessdependentonthebatchsize
"""
#start-snippet-2
#y.shape[0]is(symbolically)thenumberofrowsiny,i.e.,
#numberofexamples(callitn)intheminibatch
#T.arange(y.shape[0])isasymbolicvectorwhichwillcontain
#[0,1,2,...n-1]T.log(self.p_y_given_x)isamatrixof
#Log-Probabilities(callitLP)withonerowperexampleand
#onecolumnperclassLP[T.arange(y.shape[0]),y]isavector
#vcontaining[LP[0,y[0]],LP[1,y[1]],LP[2,y[2]],...,
#LP[n-1,y[n-1]]]andT.mean(LP[T.arange(y.shape[0]),y])is
#themean(acrossminibatchexamples)oftheelementsinv,
#i.e.,themeanlog-likelihoodacrosstheminibatch.
return-T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]),y])
#end-snippet-2

deferrors(self,y):
"""Returnafloatrepresentingthenumberoferrorsintheminibatch
overthetotalnumberofexamplesoftheminibatch;zeroone
lossoverthesizeoftheminibatch

:typey:theano.tensor.TensorType
:paramy:correspondstoavectorthatgivesforeachexamplethe
correctlabel
"""

#checkifyhassamedimensionofy_pred
ify.ndim!=self.y_pred.ndim:
raiseTypeError(
'yshouldhavethesameshapeasself.y_pred',
('y',y.type,'y_pred',self.y_pred.type)
)
#checkifyisofthecorrectdatatype
ify.dtype.startswith('int'):
#theT.neqoperatorreturnsavectorof0sand1s,where1
#representsamistakeinprediction
returnT.mean(T.neq(self.y_pred,y))
else:
raiseNotImplementedError()

defload_data(dataset):
'''Loadsthedataset

:typedataset:string
:paramdataset:thepathtothedataset(hereMNIST)
'''

#############
#LOADDATA#
#############

#DownloadtheMNISTdatasetifitisnotpresent
data_dir,data_file=os.path.split(dataset)
ifdata_dir==""andnotos.path.isfile(dataset):
#Checkifdatasetisinthedatadirectory.
new_path=os.path.join(
os.path.split(__file__)[0],
"..",
"data",
dataset
)
ifos.path.isfile(new_path)ordata_file=='mnist.pkl.gz':
dataset=new_path

if(notos.path.isfile(dataset))anddata_file=='mnist.pkl.gz':
fromsix.movesimporturllib
origin=(
'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'
)
print('Downloadingdatafrom%s'%origin)
urllib.request.urlretrieve(origin,dataset)

print('...loadingdata')

#Loadthedataset
withgzip.open(dataset,'rb')asf:
try:
train_set,valid_set,test_set=pickle.load(f,encoding='latin1')
except:
train_set,valid_set,test_set=pickle.load(f)
#train_set,valid_set,test_setformat:tuple(input,target)
#inputisanumpy.ndarrayof2dimensions(amatrix)
#whereeachrowcorrespondstoanexample.targetisa
#numpy.ndarrayof1dimension(vector)thathasthesamelengthas
#thenumberofrowsintheinput.Itshouldgivethetarget
#totheexamplewiththesameindexintheinput.

defshared_dataset(data_xy,borrow=True):
"""Functionthatloadsthedatasetintosharedvariables

Thereasonwestoreourdatasetinsharedvariablesistoallow
TheanotocopyitintotheGPUmemory(whencodeisrunonGPU).
SincecopyingdataintotheGPUisslow,copyingaminibatcheverytime
isneeded(thedefaultbehaviourifthedataisnotinashared
variable)wouldleadtoalargedecreaseinperformance.
"""
data_x,data_y=data_xy
shared_x=theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=borrow)
shared_y=theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=borrow)
#WhenstoringdataontheGPUithastobestoredasfloats
#thereforewewillstorethelabelsas``floatX``aswell
#(``shared_y``doesexactlythat).Butduringourcomputations
#weneedthemasints(weuselabelsasindex,andiftheyare
#floatsitdoesn'tmakesense)thereforeinsteadofreturning
#``shared_y``wewillhavetocastittoint.Thislittlehack
#letsousgetaroundthisissue
returnshared_x,T.cast(shared_y,'int32')

test_set_x,test_set_y=shared_dataset(test_set)
valid_set_x,valid_set_y=shared_dataset(valid_set)
train_set_x,train_set_y=shared_dataset(train_set)

rval=[(train_set_x,train_set_y),(valid_set_x,valid_set_y),
(test_set_x,test_set_y)]
returnrval

defsgd_optimization_mnist(learning_rate=0.13,n_epochs=1000,
dataset='mnist.pkl.gz',
batch_size=600):
"""
Demonstratestochasticgradientdescentoptimizationofalog-linear
model

ThisisdemonstratedonMNIST.

:typelearning_rate:float
:paramlearning_rate:learningrateused(factorforthestochastic
gradient)

:typen_epochs:int
:paramn_epochs:maximalnumberofepochstoruntheoptimizer

:typedataset:string
:paramdataset:thepathoftheMNISTdatasetfilefromhttp://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz
"""
datasets=load_data(dataset)

train_set_x,train_set_y=datasets[0]
valid_set_x,valid_set_y=datasets[1]
test_set_x,test_set_y=datasets[2]

#computenumberofminibatchesfortraining,validationandtesting
n_train_batches=train_set_x.get_value(borrow=True).shape[0]//batch_size
n_valid_batches=valid_set_x.get_value(borrow=True).shape[0]//batch_size
n_test_batches=test_set_x.get_value(borrow=True).shape[0]//batch_size

######################
#BUILDACTUALMODEL#
######################
print('...buildingthemodel')

#allocatesymbolicvariablesforthedata
index=T.lscalar()#indextoa[mini]batch

#generatesymbolicvariablesforinput(xandyrepresenta
#minibatch)
x=T.matrix('x')#data,presentedasrasterizedimages
y=T.ivector('y')#labels,presentedas1Dvectorof[int]labels

#constructthelogisticregressionclass
#EachMNISTimagehassize28*28
classifier=LogisticRegression(input=x,n_in=28*28,n_out=10)

#thecostweminimizeduringtrainingisthenegativeloglikelihoodof
#themodelinsymbolicformat
cost=classifier.negative_log_likelihood(y)

#compilingaTheanofunctionthatcomputesthemistakesthataremadeby
#themodelonaminibatch
test_model=theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x:test_set_x[index*batch_size:(index+1)*batch_size],
y:test_set_y[index*batch_size:(index+1)*batch_size]
}
)

validate_model=theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x:valid_set_x[index*batch_size:(index+1)*batch_size],
y:valid_set_y[index*batch_size:(index+1)*batch_size]
}
)

#computethegradientofcostwithrespecttotheta=(W,b)
g_W=T.grad(cost=cost,wrt=classifier.W) g_b=T.grad(cost=cost,wrt=classifier.b)

#start-snippet-3
#specifyhowtoupdatetheparametersofthemodelasalistof #(variable,updateexpression)pairs. updates=[(classifier.W,classifier.W-learning_rate*g_W), (classifier.b,classifier.b-learning_rate*g_b)] #compilingaTheanofunction`train_model`thatreturnsthecost,butin #thesametimeupdatestheparameterofthemodelbasedontherules #definedin`updates` train_model=theano.function( inputs=[index], outputs=cost, updates=updates, givens={ x:train_set_x[index*batch_size:(index+1)*batch_size], y:train_set_y[index*batch_size:(index+1)*batch_size] } )
#end-snippet-3

###############
#TRAINMODEL#
###############
print('...trainingthemodel')
#early-stoppingparameters
patience=5000#lookasthismanyexamplesregardless
patience_increase=2#waitthismuchlongerwhenanewbestis
#found
improvement_threshold=0.995#arelativeimprovementofthismuchis
#consideredsignificant
validation_frequency=min(n_train_batches,patience//2)
#gothroughthismany
#minibatchebeforecheckingthenetwork
#onthevalidationset;inthiscasewe
#checkeveryepoch

best_validation_loss=numpy.inf
test_score=0.
start_time=timeit.default_timer()

done_looping=False
epoch=0
while(epoch<n_epochs)and(notdone_looping):
epoch=epoch+1
forminibatch_indexinrange(n_train_batches):

minibatch_avg_cost=train_model(minibatch_index)
#iterationnumber
iter=(epoch-1)*n_train_batches+minibatch_index

if(iter+1)%validation_frequency==0:
#computezero-onelossonvalidationset
validation_losses=[validate_model(i)
foriinrange(n_valid_batches)]
this_validation_loss=numpy.mean(validation_losses)

print(
'epoch%i,minibatch%i/%i,validationerror%f%%'%
(
epoch,
minibatch_index+1,
n_train_batches,
this_validation_loss*100.
)
)

#ifwegotthebestvalidationscoreuntilnow
ifthis_validation_loss<best_validation_loss:
#improvepatienceiflossimprovementisgoodenough
ifthis_validation_loss<best_validation_loss*\
improvement_threshold:
patience=max(patience,iter*patience_increase)

best_validation_loss=this_validation_loss
#testitonthetestset

test_losses=[test_model(i)
foriinrange(n_test_batches)]
test_score=numpy.mean(test_losses)

print(
(
'epoch%i,minibatch%i/%i,testerrorof'
'bestmodel%f%%'
)%
(
epoch,
minibatch_index+1,
n_train_batches,
test_score*100.
)
)

#savethebestmodel
withopen('best_model.pkl','wb')asf:
pickle.dump(classifier,f)

ifpatience<=iter:
done_looping=True
break

end_time=timeit.default_timer()
print(
(
'Optimizationcompletewithbestvalidationscoreof%f%%,'
'withtestperformance%f%%'
)
%(best_validation_loss*100.,test_score*100.)
)
print('Thecoderunfor%depochs,with%fepochs/sec'%(
epoch,1.*epoch/(end_time-start_time)))
print(('Thecodeforfile'+
os.path.split(__file__)[1]+
'ranfor%.1fs'%((end_time-start_time))),file=sys.stderr)

defpredict():
"""
Anexampleofhowtoloadatrainedmodelanduseit
topredictlabels.
"""

#loadthesavedmodel
classifier=pickle.load(open('best_model.pkl'))

#compileapredictorfunction
predict_model=theano.function(
inputs=[classifier.input],
outputs=classifier.y_pred)

#Wecantestitonsomeexamplesfromtesttest
dataset='mnist.pkl.gz'
datasets=load_data(dataset)
test_set_x,test_set_y=datasets[2]
test_set_x=test_set_x.get_value()

predicted_values=predict_model(test_set_x[:10])
print("Predictedvaluesforthefirst10examplesintestset:")
print(predicted_values)

if__name__=='__main__':
sgd_optimization_mnist()


使用者可以在深度学习教程文件夹中通过输入以下代码来使用SGD逻辑回归对MNIST数字进行分类:

pythoncode/logistic_sgd.py


输出应该类似于:

...
epoch72,minibatch83/83,validationerror7.510417%
epoch72,minibatch83/83,testerrorofbestmodel7.510417%
epoch73,minibatch83/83,validationerror7.500000%
epoch73,minibatch83/83,testerrorofbestmodel7.489583%
Optimizationcompletewithbestvalidationscoreof7.500000%,withtestperformance7.489583%
Thecoderunfor74epochs,with1.936983epochs/sec


使用训练模型预测

每次验证集误差降低时sgd_optimization_mnist将保存更优模型,我们可以载入这个模型用于预测新数据的标签,预测函数示例如下:

defpredict():
"""
Anexampleofhowtoloadatrainedmodelanduseit
topredictlabels.
"""

#loadthesavedmodel
classifier=pickle.load(open('best_model.pkl'))

#compileapredictorfunction
predict_model=theano.function(
inputs=[classifier.input],
outputs=classifier.y_pred)

#Wecantestitonsomeexamplesfromtesttest
dataset='mnist.pkl.gz'
datasets=load_data(dataset)
test_set_x,test_set_y=datasets[2]
test_set_x=test_set_x.get_value()

predicted_values=predict_model(test_set_x[:10])
print("Predictedvaluesforthefirst10examplesintestset:")
print(predicted_values)

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  #深度学习 #Theano