您的位置:首页 > 编程语言 > Go语言

Coursera | Andrew Ng (02-week3-3.11)—TensorFlow

2018-01-23 11:15 501 查看
该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂

转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎https://zhuanlan.zhihu.com/c_147249273

CSDNhttp://blog.csdn.net/junjun_zhao/article/details/79122965

3.11 TensorFlow

(字幕来源:网易云课堂)



Welcome to the last video for this week.There are many great deep learning programming frameworks.One of them is TensorFlow .I’m excited to help you start to learn to use TensorFlow .What I want to do in this video is show you the basic structure of a TensorFlow program, and then leave you to practice, learn more details, and practice them yourself in this week’s programming exercise.This week’s programming exercise will take some time to do so please be sure to leave some extra time to do it.As a motivating problem,let’s say that you have some cost function J that you want to minimize.And for this example, I’m going to use this highly simple cost function
J(w) = w squared- 10w + 25
.So that’s the cost function.You might notice that this function is actually (w- 5) squared.If you expand out this quadratic,you get the expression above,and so the value of w that minimizes this is w = 5.But let’s say we didn’t know that, and you just have this function.Let’s see how you can implement something in TensorFlow to minimize this.Because a very similar structure of program can be used totrain neural networkswhere you can have some complicated cost function J(w, b)depending on all the parameters of your neural network.And then, similarly, you’ll be able to use TensorFlow toautomatically try to find values of w and b that minimize this cost function.But let’s start with the simpler example on the left.



欢迎来到这周的最后一个视频,有很多很棒的深度学习编程框架其中一个是 TensorFlow ,我很期待帮助你开始学习使用 TensorFlow ,我想在这个视频中向你展示 TensorFlow 程序的,基本结构 。然后让你自己练习 学习更多细节,并运用到本周的编程练习中,这周的编程练习需要花些时间来做,所以请务必留出一些空余时间,先提一个启发性的问题,假设你有一个损失函数 J 需要最小化,在本例中 我将使用这个高度简化的损失函数,
J(w) = w^2- 10w + 25
,这就是损失函数,也许你已注意到该函数其实就是 w 减 5 的平方,如果你把这个二次方的式子展开,就得到了上面的表达式,所以使它最小的 w 值是 w 等于 5,但是假设我们不知道这点 你只有这个函数,我们来看一下怎样用 TensorFlow 将其最小化,因为一个非常类似的程序结构可以用来,训练神经网络,其中可以有一些复杂的损失函数
J(w, b)
,取决于你的神经网络的所有参数,然后类似地 你就能用 TensorFlow ,自动找到使损失函数最小的 w 和 b 的值,但是让我们先从左边这个更简单的例子入手。

So, I’m running Python in my Jupyter Notebook and to start up TensorFlow , you
import numpy as np
and it’s idiomatic to use
import  TensorFlow  as tf
.Next, let me define the parameter w.So in TensorFlow , you’re going to use
tf.Variable
to define a parameter.
dtype=tf.float32
.And then let’s define the cost function.So remember the cost function was w squared- 10w + 25.So let me use
tf.add
.So I’m going to have w squared +
tf.multiply
.So the second term was -10.w.And then I’m going to add that to 25.So let me put another tf.add over there.So that defines the cost J that we had.And then, I’m going to write
train = tf.train.GradientDescentOptimizer
.Let’s use a learning rate of 0.01 and the goal is to minimize the cost.And finally, the following few lines are quite idiomatic.
init = tf.global_variables_initializer
and then
session = tf.Sessions()
.So it starts a TensorFlow session.Session.run init to initialize global variables.And then, for TensorFlow ‘s to evaluate a variable,we’re going to use sess.run w.We haven’t done anything yet.So with this line above, initialize w to zero and define a cost function.We define train to be our learning algorithm which uses a GradientDescentOptimizer **to minimize the cost function.**But we haven’t actually run the learning algorithm yet, sosession.run, we evaluate w,and let me print session.run(w).So if we run that, it evaluates w to be equal to 0 because we haven’t run anything yet.

我在我的 Jupyter 笔记本中运行 Python,为了启动 TensorFlow 首先
import numpy as np
,而且我们习惯用
import  TensorFlow  as tf
,接下来 让我定义参数 w,在 TensorFlow 中 你要用
tf.Variable
来定义参数,
dtype=tf.float32
,然后我们来定义损失函数,记得损失函数是
w^2-10w+25
,让我用
tf.add
,我们有w平方加
tf.multiply
,第二项是 -10 乘以 w,然后我再加上 25,让我在这里再加一个
tf.add
,这样就定义了我们的损失 J,然后再写
train = tf.train.GradientDescentOptimizer
,让我们用 0.01 的学习率 目标是最小化损失,最后 下面的几行是惯用表达,
init = tf.global_variables_initializer
,然后
session = tf.Sessions
,这样就开启了一个 TensorFlow session,
session.run(init)
来初始化全局变量,然后要让 TensorFlow 评估一个变量,我们要用到
sess.run(w)
,我们还什么都没有做,上面的这一行将 w 初始化为 0,并定义损失函数,我们定义 train 为学习算法,它用梯度下降法优化器使损失函数最小化,但实际上我们还没有运行学习算法,
session.run
我们评估了 w,让我
print session.run(w)
,所以如果我们运行这个 它评估 w 等于 0,因为我们什么都还没运行。

import numpy as np
import tensorflow as tf

w = tf.Variable(0.0, dtype=tf.float32)
cost = tf.add(tf.add(w**2, tf.multiply(-10., w)), 25)
train = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

init = tf.global_variables_initializer()
session = tf.Session()
session.run(init)
print(session.run(w))
# 0.0
# 錯誤点 cost = tf.add(tf.add(w**2, tf.multiply(-10., w)), 25) 中 -10. 后面有个点 代表 float 类型,没有点 则是 int 类型。注意 变量类型


Now, let’s do session.run train.So what this will do is run one step of GradientDescent.And then let’s evaluate the value of w after onestep of GradientDescent and print that.So we do that after the one step of GradientDescent, w is now 0.1.Let’s now run 1000 iterations of GradientDescent so .run(train).And lets then print(session.run w).So this would run a 1,000 iterations of GradientDescent,and at the end w ends up being 4.99999.Remember, we said that we’re minimizing w- 5 squared so the optimum value of w is 5 and it got very close to this.

现在让我们输入
session.run(train)
,它所做的就是运行一步梯度下降法,接下来在运行了一步梯度下降法后,让我们评估一下 w 的值 再 print,在一步梯度下降法之后 w 现在是 0.1,现在我们运行梯度下降 1000 次迭代 .run(train),然后
print(session.run(w))
,它运行了梯度下降的 1000 次迭代,最后 w 变成了 4.99999,记不记得我们说要使
(w - 5)^2
最小化,因此 w 的最优值是 5 这个结果已经很接近了。

session.run(train)
print(session.run(w))
# 0.1 每次训练可能不一样 0.099999994 0.199999

for i in range(1000):
session.run(train)
print(session.run(w))

#4.9999886


So hope this gives you a sense of the broad structure of a TensorFlow program.And as you do the programming exercise and play with more TensorFlow codes yourself,some of these functions that I’m using here will become more familiar.Some things to notice about this, w is the parameter we are trying to optimize so we’re going to declare that as a variable.And notice that all we had to do was define a cost function using these add and multiply and so on functions.And TensorFlow knows automatically how to take derivatives with respect to the add and multiply as well as other functions.Which is why you only had to implement basically forward prop andit can figure out how to do the back prop or the gradient computation.Because that’s already built in to the add and multiply as well as the squaring functions.By the way, in case this notation seems really ugly, TensorFlow actually has overloaded the computation for the usual plus, minus, and so on.So you could also just write this nicer format for the cost and comment that out and rerun this and get the same result.So once w is declared to be a TensorFlow variable,the squaring, multiplication, adding, and subtraction operations are overloaded.So you don’t need to use this uglier syntax that I had above.

希望这让你对 TensorFlow 程序的大致结构有了了解,当你做编程练习 使用更多 TensorFlow 代码时,我这里用到的一些函数你会熟悉起来,这里有地方要注意 w 是我们想要优化的参数,因此将它称为变量,注意我们需要做的就是定义一个损失函数,使用这些 add 和 multiply 之类的函数, TensorFlow 知道如何对 add 和 mutiply,还有其他函数求导,这就是为什么你只需基本实现前向传播,它能弄明白如何做反向传播和梯度计算,因为它已经内置在add multiply和平方函数中,对了 要是觉得这种写法不好看的话, TensorFlow 其实还重载了一般的加减运算等等,因此你也可以把 cost 写成更好看的形式,把这个标成注释 重新运行 得到了同样的结果,一旦 w 被称为 TensorFlow 变量,平方 乘法和加减运算都重载了,因此你不必使用上面这种不好看的句法。

#cost = tf.add(tf.add(w**2, tf.multiply(-10., w)), 25)

cost = x[0][0]*w**2 + x[1][0]*w + x[2][0]


Now, there’s just one more feature of TensorFlow that I want to show you,which is this example minimize a fix function of w.What if the function you want to minimize is the function of your training set?So whatever you have some training data, x andwhen you’re training a neural network the training data x can change.So how do you get training data into a TensorFlow program?So I’m going to define t and xwhich is think of this as playing a role of training data or really the training data with both x and y, but we only have x in this example.So this is going to define x with placeholder and it’s going to be a type float32 and let’s make this a [3,1] array.And what I’m going to do is whereas the cost here have fixed coefficients in front of the three terms in this quadratic was 1 times w squared- 10*w + 25.We could turn these numbers 1- 10 and 25 into data.So what I’m going to do is replace the cost with cost = x[0][0]*w squared + x[1][0]*w + x[2][0].Well, times 1.So now x becomes sort of like data that controls the coefficients of this quadratic function.And this placeholder function tells TensorFlow thatx is something that you provide the values for later.So let’s define another array, coefficient = np.array,[1.], [-10.] and yes, the last value was [25.].So that’s going to be the data that we’re going to plug into x.So finally we need a way to get this array coefficients into the variable xand the syntax to do that is just doing the training step.That the values for will need to be provided for x,I’m going to set here, feed_dict = x:coefficients,And I’m going to change this, I’m going to copy and paste put that there as well.All right, hopefully, I didn’t have any syntax errors.Let’s try re-running this and we get the same results hopefully as before.

TensorFlow 还有一个特点我想告诉你,那就是这个例子将 w 的一个固定函数最小化了,如果你想要最小化的函数是训练集函数又如何呢?不管你有什么训练数据 x,当你训练神经网络时,训练数据 x 会改变,那么如何把训练数据加入 TensorFlow 程序呢?我会定义 t 和 x,把它想做扮演训练数据的角色,事实上训练数据有 x 和 y 但这个例子中只有 x,把 x 定义为 placeholder,类型是 float32,让它成为 [3, 1] 数组,我要做的就是,因为 cost 这个二次方程的三个项前有固定的系数,它是
1*w^2-10*w+25
,我们可以把这些数字 1 -10 和 25 变成数据,我要做的就是把cost,替换成
cost = x[0][0]*w*2 + x[1][0]*w + x[2][0]
,嗯 再乘 1,现在 x 变成了控制,这个二次函数系数的数据,这个
placeholder
函数告诉 TensorFlow ,你稍后会为x 提供数值,让我们再定义一个数组
coefficient = np.array
,[1.], [-10.] 还有最后一个值是 [25.] ,这就是我们要接入 x 的数据,最后我们需要用某种方式把这个系数数组接入变量 x,做到这一点的句法是 在训练这一步中,要提供给 x 的数值,我在这里设置
feed_dict = x:coefficients
,然后改一下这个 我要复制粘贴到这里,好了 希望没有句法错误,我们重新运行它 希望得到和之前一样的结果。

officients = np.array([[1.], [-10.], [25.]])

w = tf.Variable(0.0, dtype=tf.float32)
x = tf.placeholder(tf.float32, [3,1])
# cost = tf.add(tf.add(w**2, tf.multiply(-10., w)), 25)
# cost  = w**2 - 10*w + 25
cost = x[0][0]*w**2 + x[1][0]*w + x[2][0]
train = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

init = tf.global_variables_initializer()
session = tf.Session()
session.run(init)
print(session.run(w))

session.run(train, feed_dict={x:cofficients})
print(session.run(w))

for i in range(1000):
session.run(train,feed_dict={x:cofficients})
print(session.run(w))


And now, if you want to change the coefficients of this quadratic function,let’s say you take this [-10.] and change it to 20, [-20].And let’s change this to 100.So this is now a function x- 10 squared.And if I re-run this,hopefully, I find that the value that minimizes x- 10 squared is w = 10.Let’s see, cool, great and we get w very close to 10 after running 1,000 integrations of GradientDescent.So what you see more of when you do that programming exercise is thata placeholder in TensorFlow is a variable whose value you assign later.And this is a convenient way to get your training data into the cost function.And the way you get your data into the cost function is with this syntax when you’re running a training iteration touse the
feed_dict
to set x to be equal to the coefficients here.And if you are doing mini batch Gradient Descentwhere on each iteration you need to plug in a different mini-batch,then on different iterations you use the feed_dict to feed in different subsets of your training sets.Different mini batches into where your cost function is expecting to see data.

现在如果你想改变这个二次函数的系数,假设你把 [-10.] 变成 [-20],让我们把它改成 100,现在这个函数就变成了 (x−10)2(x−10)2,如果我重新运行,希望我得到的使 (x−10)2(x−10)2 最小化的 w 值是10,让我们看一下 很好,在梯度下降 1000 次迭代之后 我们得到接近 10 的 w,在你做编程练习时见到更多的是, TensorFlow 中的 placeholder 是一个你之后会赋值的变量,这种方式便于把训练数据加入损失方程,把数据加入损失方程用的是这个句法,当你运行训练迭代,
feed_dict
来让 x 等于 coefficients,如果你在做 mini-batch 梯度下降,在每次迭代时你需要插入不同的 mini-batch,那么每次迭代 你就用feed_dict 来喂入训练集的不同子集
把不同的 mini-batch 喂入损失函数需要数据的地方

cofficients = np.array([[1.], [-20.], [100.]])

w = tf.Variable(0.0, dtype=tf.float32)
x = tf.placeholder(tf.float32, [3,1])
# cost = tf.add(tf.add(w**2, tf.multiply(-10., w)), 25)
# cost  = w**2 - 10*w + 25
cost = x[0][0]*w**2 + x[1][0]*w + x[2][0]
train = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

init = tf.global_variables_initializer()
session = tf.Session()
session.run(init)
print(session.run(w))

session.run(train, feed_dict={x:cofficients})
print(session.run(w))
#0.19999999

for i in range(1000):
session.run(train,feed_dict={x:cofficients})
print(session.run(w))
#9.999977


So hopefully this gives you a sense of what TensorFlow can do.And the thing that makes this so powerful is all you need to do is specify how to compute the cost functionand then, it takes derivatives and it can apply a gradient optimizer or an Adam optimizer or some other optimizer with just pretty much one or two lines of codes.So here’s the code again.I’ve cleaned this up just a little bit.And in case some of these functions or variables seem a little bit mysterious to use,they will become more familiar after you’ve practiced with it a couple times by working through programming exercise.Just one last thing I want to mention.These three lines of code are quite idiomatic in TensorFlow , and what some programmers will do is use this alternative format.Which basically does the same thing.Set session to tf.Session() to start the session,and then use the session to run init, andthen use the session to evaluate, say, w and then print the result.But this with construction is used in a number of TensorFlow programs as well.It more or less means the same thing as the thing on the left.But the with command in Python is a little bit better at cleaning upin case there’s an error or exception while executing this inner loop.So you will see this in the programming exercise as well.



希望这让你了解了 TensorFlow 能做什么,让它如此强大的是,你只需说明如何计算损失函数,它就能求导,而且用一两行代码就能运用梯度优化器 Adam 优化器或者其他优化器,这还是刚才的代码,我稍微整理了一下,尽管这些函数或变量看上去有点神秘,但你在做编程练习时多练习几次,就会熟悉起来了,还有最后一点我想提一下,这三行在 TensorFlow 里是符合表达习惯的,有些程序员会用这种形式来替代,作用基本上是一样的,把 session 设置为tf.Session() 来开始 session,然后用 session 来运行 init,再用 session 评估例如 w 再输出结果,但这个 with 结构也会在很多 TensorFlow 程序中用到,它的意思基本上和左边的相同,但是 Python 中的 with 命令更方便清理,以防在执行这个内循环时出现错误或例外,所以你也会在编程练习中看到这种写法。

with tf.Session() as session:
session.run(init)
print(session.run(w))


So what is this code really doing?Let’s focus on this equation.The heart of a TensorFlow program is something to compute a cost,and then TensorFlow automatically figures out the derivatives and how to minimize that costs.So what this equation or what this line of code is doing is allowing TensorFlow to construct a computation graph.And a computation graph does the following, it takes x[0][0],it takes w and then it goes w gets squared.And then x[0][0] gets multiplied with w squared,so you have x[0][0]*w squared, and so on, right?And eventually, you know, this gets built up to compute this xw,x[0][0]*w squared + x[1][0]*w + and so on.And so eventually, you get the cost function.And so the last term to be added would be x [2][0]it gets added to be the cost.I won’t write all the formulas for the cost.And the nice thing about TensorFlow is that by implementing basically forward propagation through this computation graph that computes the cost, TensorFlow already has that built in,all the necessary backward functions.So remember how training a deep neural network has a set of forward functions and a set of backward functions.And programming frameworks like TensorFlow have already built-in the necessary backward functions.Which is why by using the built-in functions to compute the forward function,it can automatically do the backward functions as well to implement back propagation through even very complicated functions and compute derivatives for you.So that’s why you don’t need to explicitly implement back prop.



那么这个代码到底做了什么呢?让我们看这个等式, TensorFlow 程序的核心是计算损失的东西然后 TensorFlow 自动算出导数,以及如何最小化损失,因此这个等式或者这行代码所做的,就是让 TensorFlow 建立计算图,计算图所做的就是取 x[0][0],取w 然后将它平方,然后
x[0][0]
w^2
相乘,你就得到了
x[0][0]*w^2
以此类推,最终整个建立起来计算,
x[0][0]*w^2 + x[1][0]*w +
等等,最后你得到了损失函数,最后加上的一项应该是x [2][0],加上去得到损失函数,我就不把所有方程写出来了, TensorFlow 的优点在于 通过用这个计算损失的,计算图基本实现前向传播, TensorFlow 已经内置了,所有必要的反向函数,回忆一下训练深度神经网络时有一组前向函数,和一组反向函数,而像 TensorFlow 之类的编程框架已经内置了必要的反向函数这也是为什么通过内置函数来计算前向函数,它也能自动用反向函数来实现反向传播,即便函数非常复杂 。再帮你计算导数,这就是为什么你不需要明确实现反向传播。

This is one of the things that makes the programming frameworks help you become really efficient.If you look at the TensorFlow documentation,I just want to point out that the TensorFlow documentation uses a slightly different notation than I did for drawing the computation graph.So it uses x[0][0] w.And then, rather than writing the value, like w squared,the TensorFlow documentation tends to just write the operation.So this would be a square operation,and then these two get combined in the multiplication operation and so on.And then, the final nodes, I guess that would bean addition operation where you add x[2][0] to find the final value.So for the purposes of this class, I thought that this notation for the computation draft would be easier for you to understand.But if you look at the TensorFlow documentation,if you look at the computation graphs in the documentation,you see this alternative convention where the nodes are labeled with the operations rather than with the value.But both of these representations represent basically the same computation graph.And there are a lot of things that you can with just one line of code in programming frameworks.For example, if you don’t want to use GradientDescent,but instead you want to use the Adam Optimizerby changing this line of code, you can very quickly swap it,swap in a better optimization algorithm.So all the modern deep learning programming framework support things like this andmakes it really easy for you to code up even pretty complex neural networks.



这是编程框架能帮你变得高效的原因之一,如果你看 TensorFlow 的使用说明,我只想指出 TensorFlow 的说明,用了一套和我不太一样的符号来画计算图,它用了x[0][0] w,然后它不是写出值 像这里的w^2, TensorFlow 使用说明倾向于只写运算符,所以这里就是平方运算,而这两者一起指向乘法运算 以此类推,然后在最后的节点 我猜应该是,一个将x[2][0]加上去得到最终值的加法运算,为本课程起见 我认为计算图用,这种表示方式会更容易理解,但是如果你去看 TensorFlow 的使用说明,如果你看到说明里的计算图,你会看到另一种表示方式,节点都用运算来标记而不是值,但这两种呈现方式表达的是同样的计算图,在编程框架中你可以用一行代码做很多事情,例如 你不想用梯度下降法,而是想用 Adam 优化器,你只要改变这行代码 就能很快换掉它,换成更好的优化算法,所有现代深度学习编程框架都支持这样的功能,让你很容易就能编写复杂的神经网络。

So I hope this is helpful for giving you a sense of the typical structure of a TensorFlow program.To recap the material from this week,you saw how to systematically organize the hyperparameter search process.We also talked about batch normalization and how you can use that to speed up training of your neural networks.And finally, we talked about programming frameworks of deep learning.There are many great programming frameworks.And we had this last video focusing on TensorFlow .With that, I hope you enjoyed this week’s programming exercise and that helps you gain even more familiarity with these ideas.

我希望我帮助你,了解了 TensorFlow 程序典型的结构,概括一下这周的内容,你学习了如何系统化地组织超参数搜索过程,我们还讲了 batch 归一化,以及如何用它来加速神经网络的训练,最后我们讲了深度学习的编程框架,有很多很棒的编程框架,这最后一个视频我们重点讲了 TensorFlow ,有了它 我希望你享受这周的编程练习,帮助你更熟悉这些概念。

PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息