您的位置:首页 > 其它

Coursera | Andrew Ng (01-week-2-2.13)—向量化 Logistic 回归

2017-12-29 14:48 656 查看
该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂

转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎https://zhuanlan.zhihu.com/c_147249273

CSDNhttp://blog.csdn.net/JUNJUN_ZHAO/article/details/78931540

Vectorizing Logistic Regression 向量化 Logistic 回归

(字幕来源:网易云课堂)



We have talked about how vectorization.lets you speed up your code significantly.In this video, we’ll talk about how you can vectorize the implementation of logistic regression,so they can process an entire training set,that is implement a single iteration of gradient descent with respect to an entire training set,without using even a single explicit for loop.I’m super excited about this technique,and when we talk about neural networks later and when we talk about neural networks later.Let’s get started. Let’s first examine the forward propagation steps of logistic regression.So, if you have m training examples,then to make a prediction on the first example,you need to compute that,compute z. I’m using this familiar formula,then compute the activations,you compute y hat in the first example.

我们已经讨论过向量化,是如何显著地加速你的代码,在这次视频中,我们将会谈及向量化是如何实现在 logistic 回归的上面的,这样就能同时处理整个训练集,来实现梯度下降法的一步迭代,针对整个训练集的一步迭代,不需要使用任何显式 for 循环,对于这一项技术,我特别兴奋,并且当我们后面谈及神经网络时,也可以完全不用显式 for 循环,那我们开始吧,我们先回顾,logistic 回归的正向传播步骤,如果你有 m 个训练样本,那么对第一个样本进行预测,你需要这样计算,计算出 z 运用这个熟悉的公式,然后计算激活函数,计算在第一个样本的 y帽 (y^y^)。



Then to make a prediction on the second training example,you need to compute that.Then, to make a prediction on the third example,you need to compute that, and so on.And you might need to do this m times,if you have M training examples.So, it turns out,that in order to carry out the forward propagation step,that is to compute these predictions on our m training examples,there is a way to do so,without needing an explicit for loop.

然后继续去对第二个训练样本做一个预测,你需要这样计算,然后去对第三个样本做一个预测,你需要这样计算,以此类推,你可能需要这样做上 m 次,如果你有 m 个样本,可以看出,为了执行正向传播步骤,需要对 m 个训练样本 都计算出预测结果,但有一个办法可以,不需要任何一个显式的 for 循环。

Let’s see how you can do it.First, remember that we defined a matrix capital X to be your training inputs,stacked together in different columns like this.So, this is a matrix,So, I’m writing this as a Python numpy shape,this just means that X is a n_x by m dimensional matrix.Now, the first thing I want to do is show how you can compute z(1)z(1), z(2)z(2),z(3)z(3) and so on,all in one step,in fact, with one line of code.

让我们看看如何做到,首先记得我们曾定义过一个矩阵大写 XX,来作为你的训练输入,像这样子在不同的列中堆叠在一起,这就是一个矩阵,这是一个nx×mnx×m的矩阵,我现在写的是Python numpy形式,这只是意味 XX是一个nx×mnx×m的矩阵,现在我首先想要做的是 告诉你如何计算 z(1)z(1) z(2)z(2),以及 z(3)z(3)等等,全都在一个步骤中,事实上仅用了一行代码。

So, I’m going to construct a 1 by M matrix,that’s really a row vector while I’m going to compute z(1)z(1),z(2)z(2), and so on,down to ZM, all at the same time.It turns out that this can be expressed as W transpose to capital matrix X plus and then this vector B,B and so on.B, where this thing,this B, B, B, B, b thing is a 1 x m vector or 1 x m matrix or that is as a M dimensional row vector.

所以我要先构建一个 1 x m 的矩阵,实际上就是一个行向量 然后当我计算z(1)z(1),z(2)z(2) 等等,一直到z(m)z(m) 都是在同一时间内,结果发现它可以表达成,w 的转置 乘以大写矩阵XX 加上这个向量 b,b b 等等,b 这个东西,这个 b b b b 这个东西就是一个1 x m的向量 或者,1 x m的矩阵 或者说是一个 m 维的行向量。



So hopefully there you are with matrix multiplication.You might see that W transpose X1, x2 and so on to XM,that W transpose can be a row vector.So this W transpose will be a row vector like that.And so this first term will evaluate to W transpose X1,W transpose X2 and so on, dot, dot, dot,W transpose XM, and then we add this second term B,B, B, and so on,you end up adding B to each element.So you end up with another 1xm vector.Well that’s the first element,that’s the second element and so on,and that’s the nth element.

希望你比较熟悉矩阵乘法,那么就会发现 W 转置·x(1)x(1) x(2)x(2) 等等 一直到 x(m)x(m) ,这个 w 转置可以是一个行向量,所以 w 转置会是一个这样的行向量,所以第一个项求的是 w 转置乘 x(1)x(1),w 转置乘 x(2)x(2) 等等 点点点,w 转置乘 x(m)x(m) 然后我们加上第二项 b,b b 等等,最后是给每个元素加上 b,最后得到一个 1×m 向量,这是第一个元素,这是第二个元素等等,这是第 n 个元素 。

And if you refer to the definitions above,this first element is exactly the definition of z(1)z(1).The second element is exactly the definition of z(2)z(2) and so on.So just as X was once obtained, when you took your training examples and stacked them next to each other, stacked them horizontally.I’m going to define capital Z to be this where you take the lowercase Z’s and stack them horizontally.So when you stack the lower case X’s corresponding to a different training examples,horizontally you get this variable capital X and the same way when you take these lowercase Z variables,and stack them horizontally,you get this variable capital Z.

如果你参考上面的定义,第一个元素恰恰是z(1)z(1)的定义,第二个元素恰恰是z(2)z(2)的定义 等等,XX是把所有训练样本堆叠起来得到的,一个挨着一个 横向堆叠,我将会把大写ZZ定义为这个 在这里,你用小写 zz 表示 横向排在一起,所以当你将对应于不同训练样本的,小写xx横向堆叠起来时,你得到了这个变量 大写XX,小写zz变量也是同样的处理,把它们横向地堆叠起来,你就得到了这个变量大写ZZ。

And it turns out, that in order to implement this,the numpy command is capital Z equals NP dot w dot T,that’s w transpose X and then plus b.Now there is a subtlety in Python,which is at here b is a real number or if you want to say you know 1x1 matrix,is just a normal real number.But, when you add this vector to this real number,Python automatically takes this real number B and expands it out to this 1XM row vector.So in case this operation seems a little bit mysterious,this is called broadcasting in Python,and you don’t have to worry about it for now,we’ll talk about it some more in the next video.

结果发现 为了计算这个,numpy 的指令为大写 Z = np.dot(w.T ..,那是w的转置 .. ,x) + b,这里有个 Python 巧妙的地方,在这个地方b是一个实数,或者你可以说是 1×1 的矩阵,就是一个普通的实数,但是 当你把向量加上这个实数时,Python 会自动的把实数 b,扩展成一个 1×m 的行向量,所以这个操作看上去有一点神秘,在 Python 中这叫做广播( broadcasting ),目前你不用对此感到顾虑,我们会在下一节视频中更多地谈及它 .



But the takeaway is that with just one line of code,with this line of code,you can calculate capital Z and capital Z is going to be a 1XM matrix that contains all of the lower cases Z’s.Lowercase z(1)z(1) through lower case ZM.So that was Z, how about these values a.What we like to do next is find a way to compute a(1)a(1) ,a(2)a(2)and so on to a(m)a(m),all at the same time,and just as stacking lowercase X’s resulted in capital X and stacking horizontally lowercase Z’s resulted in capital Z,stacking lower case A, is going to result in a new variable,which we are going to define as capital A.

再说回来只要用一行的代码,运用这行代码,你可以计算大写 Z,而大写 Z 是一个 1×m 的矩阵,包含所有的小写 z,小写z(1)z(1)一直到小写 z(m)z(m),这就是 Z 那么变量 a 是怎样的呢,我们接下去要做的,是找到一个办法来计算 a(1)a(1) ,a(2)a(2) 等等一直到 a(m)a(m) ,都在同一时间完成,就像把小 x 堆叠起来形成 X 一样,将小 z 横向堆叠成大 Z,堆叠小写 a 就会形成一个新的变量,我们把它定义为大写 A。

And in the program assignment,you see how to implement a vector valued sigmoidsigmoid function,so that the sigmoidsigmoid function,inputs this capital Z as a variable and very efficiently outputs capital A.So you see the details of that in the programming assignment.So just to recap,what we’ve seen on this slide is that instead of needing to loop over M training examples to compute lowercase z and lowercase a,one of the time, you can implement this one line of code,to compute all these Z’s at the same time.

在编程作业中,你能看到如何对一个向量进行 sigmoidsigmoid 函数操作,所以 sigmoidsigmoid 函数,把大写 Z 当做一个变量进行输入,然后非常高效的输出大写 A,你仔细看看编程作业里的细节,总的来说,我们在这张幻灯片所看到的是 不需要 for 循环,就可以从 m 个训练样本 一次性计算出小写 z 和小写 a ,而你运行这些只需要一行代码,在同一时间计算所有的 z。

And then, this one line of code,with appropriate implementation of lowercase Sigma to compute all the lowercase A’s all at the same time.So this is how you implement a vectorize implementation of the forward propagation for all M training examples at the same time.So to summarize, you’ve just seen how you can use vectorization to very efficiently compute all of the activations,all the lowercase a’s at the same time.Next, it turns out, you can also use vectorization very efficiently to compute the backward propagation,to compute the gradients.Let’s see how you can do that, in the next video.

这一行的代码,实现的是,用小写 sigma 同时计算所有小写 a,所以这就是,正向传播一步迭代的向量化实现,同时处理所有 m 个训练样本,概括一下 你刚刚看到如何使用向量化,高效计算激活函数,同时输出所有小 a,接下来 你会发现同样可以用向量化,来高效地计算反向传播,并以此来计算梯度,我们在下一次视频中将看到它如何实现。

重点总结:

所有 m 个样本的线性输出 ZZ 可以用矩阵表示:

Z=wTX+bZ=wTX+b

Z = np.dot(w.T,X) + b
A = sigmoid(Z)


逻辑回归梯度下降输出向量化

dZdZ对于mm个样本,维度为(1,m)(1,m),表示为:

dZ=A−YdZ=A−Y

db可以表示为:

db=1m∑mi=1dz(i)db=1m∑i=1mdz(i)

db = 1/m * np.sum(dZ)


dw可表示为:

dw=1mX⋅dZTdw=1mX⋅dZT

dw = 1/m*np.dot(X,dZ.T)


参考文献:

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(1-2)– 神经网络基础

PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息