您的位置:首页 > 其它

Coursera | Andrew Ng (01-week-4-4.4)—核对矩阵的维数

2018-01-11 13:29 686 查看
该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂

转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎https://zhuanlan.zhihu.com/c_147249273

CSDNhttp://blog.csdn.net/junjun_zhao/article/details/79030792

4.4 Getting your matrix dimensions right (核对矩阵的维数)

(字幕来源:网易云课堂)



when implementing the deep neural network,one of the debugging tools I often use to check the correctness of my code is to pull a piece of paper,and just work through the dimensions and matrix are working with.so let me show you how to do that.since I hope this will make it easier for you to implement your deep net as well,so capital L is equal to five and come down quickly,not counting the input layer there are five layers here,so 4 hidden layers and one output layer,and so if you implement forward propagation,the first step will be z[1] equals w[1],times the input features x plus b[1],so let’s ignore the um bias terms b for now,and focus on the parameters w now,this first hidden layer has three hidden units,so this is um layer 0 layer 1 layer 2 layer 3 layer 4 and layer 5.

当实现深度神经网络的时候,其中一个我常用的检查代码是否有错的方法是拿出一张纸,然后过一遍算法中矩阵的维数,下面我会给大家展示具体怎么做,希望也能帮助大家更容易地实现自己的深度网络,这里 L 等于 5,除去输入层以外数下来 总共有 5 层,总共 4 个隐层 一个输出层,如果你想实现正向传播,第一步是z[1]等于w[1],乘以输入特征 x 加上b[1],我们现在可以先忽略掉偏置项 b,只关注参数 w,这里的第一隐层有三个隐藏单元,请看图上 我标一下 0 1 2 3 4 5 层。



so using the notation we had from the previous video,we have the n[1] which is the number of hidden units,and layer 1 is equal to 3,and here we would have that n[2] is equal to 5 n[3] is equal to 4,n[4] is equal to 2 and n[5] is equal to 1,and so far we’ve only seen neural networks with a single output unit,that later in later courses we’ll talk about neural networks with multiple output units as well,and finally um for the input layer,we also have n[0] is equal to nx is equal to 2,so now let’s think about the dimensions of z w and x,z is the vector of activations for this first hidden layer,so z is going to be 3 by 1,is going to be a 3 dimensional vector,so I’m going to write it as a (n[1],1)dimensional vector,n[1] by 1 dimensional matrix is 3 by 1 in this case.

我们可以用上个视频的符号约定,第一层隐藏单元数是n[1],所以n[1]等于 3,接下来是n[2]等于 5 n[3]等于 4,n[4]等于 2 以及n[5]等于 1,到目前为止我们只看到过只有一个输出单元的神经网络,在之后的课程里我们也会学,有多个输出单元的神经网络,最后回到输入层,n[0]等于nx等于 2,然后我们来看看 z w 和 x 的维数,z 是第一个隐层的激活函数向量,这里 z 维度是 3*1,也就是一个三维的向量,我也可以写成(n[1],1)维向量,(n[1],1)维度的矩阵 在这个情况下就是 (3,1)。

重点:

z[1]=w[1]⋅X+b[1](3,1)←(3,2)∗(2,1)(n[1],1)←(n[1],n[0])(n[0],1)



now how about the input features x,x we have 2 input features so x is in this example 2x1,but more generally it will be n[0] by 1,so what we need is for the matrix W[1] to be something that,when we multiply an n[0] by 1 vector to it,we get an n1 by 1 vector right,so you have sort of a 3 dimensional vector,equals something times a 2 dimensional vector,and so by the rules of matrix multiplication,this has got to be a 3 by 2 matrix right,because the 3 by 2 matrix,times a 2 by 1 matrix or times a 2 by 1 vector,that gives you a 3 by 1 vector,and more generally this is going to be an n[1] by n[0] dimensional matrix,so what we’re think about here is that the dimensions of W[1] has to be n1 by n0,and more generally the dimensions of w[l] must be n[l] by n[l−1].

接着来看输入特征 x,x 在这里有 2 个输入特征 所以 x 的维度是 2,1,归纳起来 x 的维度是n[0]*1,所以我们需要W[1]这个矩阵能够实现这样的结果,也就是当我们用W[1]乘以一个(n[0],1)向量时,我们会得到一个n[1]*1的向量,所以你有一个三维的向量,它等于某个向量乘以一个二维向量,所以根据矩阵乘法法则,会得到一个3∗2矩阵,因为(3∗2)矩阵,乘以2∗1矩阵或者向量,你会得到一个 3∗1 的向量。稍微概括一下结果,会是一个(n[1],n[0])维度的矩阵,然后我们需要考虑一下为什么,W[1]的维度会是n[1],n[0],总结起来W[l]的维度必须是(n[l],n[l−1])。



so for example the dimensions of w[2] for this,it will have to be 5 by 3 or um it will be n2 by n1,because we’re going to compute w[2] as w[2] times a[1],and again let’s ignore the bias for now,but so this is going to be 3 by 1 and we need this to be 5 by 1,and so this had better be 5 by 3,and similarly W[3] will be is really the dimension of,the next layer comma the dimension of the previous layer,so this is going to be 4 by 5,W[4] is going to be on 2 by 4 and w[5] is going to be 1 by 2,okay so the general formula to check is that,when you’re implementing the matrix for a layer l to the dimension of that matrix,will be n[l] by n[l−1].

举个例子 这里w[2]的维度,w[2]的维度必需得是 (5,3) 也就是(n[2],n[1]),因为我们需要用w[2]乘以a[1]来计算Z[2],再次 我们先不管偏置项 b,所以在这个情况下结果得是3∗1 也就是说这个必须是 5∗1,那么这个矩阵必须是5∗3,类似地 W[3]的维度,维度是(下一层的维数,前一层的维数),结果就是 4,5,而W[4]是 (2,4) W[5]是 (1,2),好 做这种运算时 一般要检查的公式是,在实现第l层中矩阵的时候 矩阵的维度,会是n[l],n[l−1],

重点:

w[1]:(n[1],n[0])w[2]:(5,3),(n[2],n[1])z[2]=w[2]⋅a[1]+b[2](5,1)←(5,3)(3,1)

w[3]:(4,5)w[4]:(2,4)w[5]:(1,2)

Summary:

w[l]:(n[l],n[l−1])

now let’s think about the dimension of this vector b,this is going to be a 3 by 1 vector,so you have to add that to another 3 by 1 vector,in order to get a 3 by 1 vector as the output,or in this example you need to add this,which is going to be 5 by 1,so it’s going to be another 5 by 1 vector,in order for you to know the sum of these two things,we have in boxes to be on itself a 5 by 1 vector,so the more general rule is that,in the example on the left b[1] is n[1] by 1 right,that’s 3 by 1,and in the second example it is um this is n[2] by 1,and so the more general case is that b[l],should be n[l] by 1 dimensional,

现在再来看向量 b 的维度,b是一个 (3,1) 向量,如果你要做向量加法,你必须再加上同样 (3,1) 维度的向量,或者在这个例子中 你需要加上这个,维度是 (5,1),这个向量必须是 (5,1),才能做向量加法,我在这用方块标出来它们本身都是 5,1 的向量,所以再概括一下,在这个例子中左边这个b[1]是n[1],1,那是 (3,1),在第二个例子中 是n[2],1,一般来说 b[l]的维度,应该是n[l],1。

重点:

b[1]:(3,1)(n[1],1)b[2]:(5,1)(n[2],1)

Summary:

b[l]:(n[l],1)



so hopefully these two equations help you to double check that the dimensions of your matrices W,as well as of your vectors b are the correct dimensions,and of course if you’re implementing back propagation,then the dimensions of dW should be the same as dimension of W,so dW should be the same dimension as W,and db should be the same dimension as b,now the other key set of quantities,whose dimensions to check are these z x as well as a[l],which we didn’t talk too much about here,but because z[l] is equal to g[l](a[l]) applied element wise,then z and a should have the same dimension in these types of networks,

希望这两个式子能帮助检查,你的矩阵 W 的维度,以及你的向量 b 的维度,当然了你如果在实现反向传播的话,那么 dW 的维度应该和 W 的维度相同,所以 dW 应该和 W 有相同维度,然后 db 会和 b 的维度一样,我们还需要注意检查,z x 和a[l]的维度,还没怎么讲到,但是因为z[l]等于对应元素的 g[l](a[l]),那么在这类网络中 z 和 a 的维度应该相等

重点:

dw[l]:(n[l],n[l−1])db[l]:(n[l],1)

z[l]=g[l](a[l]) z 和 a 的维度应该相等

now let’s see what happens when you have a vectorized implementation that looked at multiple examples at a time,even for a vectorized implementation,of course the dimensions of W b dW and db will stay the same,but the dimensions of Z A as well as X,will change a bit in your vectorized implementation,so previously we had z[1] equals W[1] times x plus b[1],where this was n[1] by 1,this was n[1] by n[0],x was n[0] by 1,and b was n[1] by 1,

依照惯例我们接下来看看向量化的实现过程,这样就可以同时作用于多个样本,即使实现过程已经向量化了,W bdw 和 db 的维度应该始终是一样的但是 Z A 以及 X 的维度,会在向量化后发生变化,所以之前的情况 z[1]等于W[1]乘以 x 加上b[1],在那个情况下维度分别是(n[1],1),这是(n[1],n[0]),x 的维度是(n[0],1),b 的维度是(n[1],1)。



now in a vectorized implementation,you would have Z[1] equals W[1] times X plus b[1],where now Z[1] is obtained by taking the z[1]s for the individual example,so there’s z[1](1)z[1](2),up to z[1](m) and stacking them as follows,and this gives you z1,so the dimension of z[1] is that instead of being n[1] by 1,it ends up being n[1] by m if m is the size your training set,the dimensions of W[1] stays the same,so still n[1] by n[0],and X instead of being n[0] by 1,is now all your training examples stands horizontally,so is now n[0]by m,and so you notice that when you take a n[1] by n[0] matrix,and multiply that by n[0] by m matrix,that together that actually gives you an n[1] by m dimensional matrix as expected.now the final detail is that b[1] is still n[1] by 1,but when you take this and add it to b,then through Python broadcasting,this will get duplicated into an n[1] by m matrix,and added element wise。

现在当一切向量化之后,Z[1]应该等于W[1]乘以 X 加上b[1],现在Z[1]是,从每一个单独的z[1]的值叠加得到的,也就是z[1](1) z[1](2),到z[1](m)叠在一块儿的结果,出来的结果就是z[1],所以Z[1]的维度不再是(n[1],1),维度变成(n[1],m) 其中 m 是训练集大小,W[1]的维度还是一样的,还是n[1],n[0],不过 X 不再是(n[0],1),而是把所有训练样本水平叠在一块儿,现在的维度是(n[0],m),你会发现当你把一个(n[1],n[0])矩阵,乘以一个(n[0],m)矩阵,你会得到一个(n[1],m)的矩阵。最后一点关于b[1]的细节 就是b[1]的维度还是n[1],1,但是当你把这个加上 b 的时候,再用一下 Python 的 broadcasting,会复制成一个n[1],m的矩阵,然后逐个元素相加。

重点:

z[1]=w[1]⋅x+b[1](n[1],1)←(n[1],n[0])(n[0],1)(n[1],1)

向量化:

Z[1]=W[1]⋅X+b[1](n[1],m)←(n[1],n[0])(n[0],m)(n[1],1)

其中 b[1]:(n[1],1) 经 Python broadcasting →(n[1],m)



so on the previous slide we talked about the dimensions of W b dW and db,here what we see is that whereas z[l] um as well as a[l] are of dimension n[1] by 1,we have now instead that,capital Zl as well as capital Al are nl by m,and a special case of this is when l is equal to 0,in which case A[0],which is equal to just your training set input features x,is going to be equal to n[0] by m as expected,and of course when you’re implementing this um in back propagation,we’ll see later you end up computing dZ as well as dA,and so these will of course have the same dimension as Z and A.

在上一页幻灯片里我们已经说过,W b dW 和 db 的维度,在之前那一页z[l]和a[l]的维度,应该是n[1],1,在现在这页要改成,大写的Z[1]和A[1]维度相应变成n[1],m,还有个特别情况就是当l等于 0 时,对应的A[0],也就等于输入特征向量 x,A[0]的维度应该是(n[0],m),当然如果你在实现反向传播的话,我们会发现在计算了 dZ 和 dA 之后,会发现它们的维度跟 Z 和 A 是一样的。



重点:

centerz[l],a[l]:(n[l],1)Vectoried:Z[l],A[l]:(n[l],m)l=0A[0]=X=(n[0],m)dZ,dA:(n[l],m)

so I hope through exercise went through helps clarified the dimensions of the various matrices you’ll be working with,when you implement back-propagation for deep neural network,so long as you work through your code,and make sure that all the matrices dimensions are consistent,that will usually help you go some ways toward eliminating some cause of possible bugs,so I hope that exercise for figuring out,the dimensions of the various matrices you’ll be working with is helpful,when you implement a deep neural network,if you keep straight the dimensions of these various matrices and vectors you’re working with,hopefully they’ll help you eliminate some cause for possible bugs,it certainly helps me get my code right,so next we’ve now seen some of the mechanics of how to do certain forward propagation in a neural network,but why are deep neural networks so effective and why do they do better than shallow representations,let’s spend a few minutes in the next video to discuss that.

我希望刚刚举的例子,能帮助大家搞清楚,需要用代码实现的各个矩阵的维度,如果你想做深度神经网络的反向传播,在你写代码的时候,一定要确认所有的矩阵维数是前后一致的,这会大大地帮助你,排除一些 bug 的来源,希望作业也能帮助你,真正弄明白不同矩阵的维度,当你自己去实现深度神经网络时,如果你非常清晰地知道,各个矩阵和向量的维度,这会帮助你排除程序里的一些错误,起码对我自己调试的时候发现很管用,那么我们之前已经看过一些,如何在神经网络里实现正向传播算法了,机智的你一定很好奇为啥深度神经网络这么好用,又为啥它们就是比浅层一些的模型好用,在下一个视频里我们会探讨一下这个问题。

PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息