您的位置:首页 > 其它

Coursera | Andrew Ng (02-week-2-2.3)—指数加权平均

2018-01-18 19:46 447 查看
该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂

转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎https://zhuanlan.zhihu.com/c_147249273

CSDNhttp://blog.csdn.net/JUNJUN_ZHAO/article/details/79098947

2.3 Exponentially weighted averages (指数加权平均)

(字幕来源:网易云课堂)



I want to show you a few optimization algorithms.They are faster than gradient descent.In order to understand those algorithms,you need to be able use something called exponentially weighted averages.Also called exponentially weighted moving averages in statistics.Let’s first talk about that,and then we’ll use this to build up to more sophisticated optimization algorithms.So, even though I now live in the United States,I was born in London.So, for this example I got the daily temperature from London from last year.So, on January 1, temperature was 40 degrees Fahrenheit.Now, I know most of the world uses a Celsius system,but I guess I live in United States which uses Fahrenheit.So that’s four degrees Celsius.And on January 2, it was nine degrees Celsius and so on.And then about halfway through the year,a year has 365 days so, that would be,sometime day number 180 will be sometime in late May, I guess.It was 60 degrees Fahrenheit which is 15 degrees Celsius, and so on.So, it start to get warmer, towards summer and it was colder in January.



我想向你展示几个优化算法,它们比梯度下降法快要理解这些算法,你需要用到指数加权平均,在统计中也叫作指数加权移动平均,我们首先讲这个,然后再来讲更加复杂的优化算法,虽然现在我生活在美国,实际上我生于英国伦敦,比如我这儿有去年伦敦的每日温度,所以 1月1号 温度是 40 华氏度,我知道世界上大部分地区使用摄氏度,但是美国使用华氏度,相当于 4 摄氏度,在 1 月 2 号是 9摄氏度等等,在年中的时候,一年 365 天 年中就是说,大概 180 天的样子 也就是 5 月末,温度是 60 华氏度 也就是 15 摄氏度等等,夏季温度转暖 然后冬季降温。

So, you plot the data you end up with this.Where day one being sometime in January, that you know,being the, beginning of summer,and that’s the end of the year, kind of late December.So, this would be January, January 1,It is the middle of the year approaching summer,and this would be the data from the end of the year.So, this data looks a little bit noisy and if you want to compute the trends,the local average or a moving average of the temperature,here’s what you can do.Let’s initialize V zero equals zero.And then, on every day, we’re going to average it with a weight of 0.9 times whatever appears as value,plus 0.1 times that day temperature.So, data one here would be the temperature from the first day.And on the second day, we’re again going to take a weighted average.0.9 times the previous value plus 0.1 times today’s temperature and so on.Day two plus 0.1 times data three and so on.And the more general formula is V on a given day is 0.9 times V from the previous day,plus 0.1 times the temperature of that day.So, if you compute this and plot it in red,this is what you get.You get a moving average of what’s calledan exponentially weighted average of the daily temperature.



你用数据作图 可以得到以下结果,起始日在 1 月份,这里是夏季初,这里是年末 相当于 12 月末,这里是 1 月 1 号,年中接近夏季的时候,随后就是年末的数据,看起来有些杂乱 如果要计算趋势的话,也就是温度的局部平均值 或者说移动平均值,你要做的是,首先使 V0 等于 0,每天 需要使用 0.9 的加权数之前的数值,加上当日温度的 0.1,所以这里是第一天的温度值,第二天 又可以获得一个加权平均数,0.9 乘以之前的值加上当日的温度的 0.1 以此类推,第二天值加上第三日数据的 0.1 如此往下,大体公式就是某天的 V 等于前一天 V 值的0.9,加上当日温度的 0.1,如此计算 然后用红线作图的话,便得到这样的结果,你得到了移动平均值,每日温度的指数加权平均值。

So, let’s look at the equation we had from the previous slide,it was Vt equals,previously we had 0.9.We’ll now turn that prime to beta,beta times V t minus one plus and it previously, was 0.1, I’m going to turn that into one minus beta times data t,so, previously you had beta equals 0.9.It turns out that for reasons we are going to later,when you compute this you can think of Vt as approximately averaging over,something like one over one minus beta, day’s temperature.So, for example when beta goes 0.9 you could think of this as averaging over the last 10 days temperature.And that was the red line.



看一下上一张幻灯片里的公式,Vt等于,之前我们采用的是 0.9,我们把这个常数变成 β , β 乘上V(t−1)加上,之前是 0.1 现在是(1- β )乘以第 t 天的数据,所以之前 β 等于 0.9,由于以后我们要考虑的原因,在计算时可视Vt为,大概是 1/(1- β ) 的每日温度,如果 β 是 0.9 你会想,这是十天的平均值,也就是红线部分。

Now, let’s try something else.Let’s set beta to be very close to one,let’s say it’s 0.98 .Then, if you look at 1/1 minus 0.98 ,this is equal to 50 .So, this is, you know, think of this as averaging over roughly,the last 50 days temperature.And if you plot that you get this green line.So, notice a couple of things with this very high value of beta.The plot you get is much smoother because you’re now averaging over more days of temperature.So, the curve is just, you know,less wavy is now smoother,but on the flip side the curve has now shifted further to the rightbecause you’re now averaging over a much larger window of temperatures.And by averaging over a larger window,this formula, this exponentially weighted average formula.It adapts more slowly, when the temperature changes.So, there’s just a bit more latency.And the reason for that is when Beta 0.98 then it’sgiving a lot of weight to the previous valueand a much smaller weight just 0.02, to whatever you’re seeing right now.So, when the temperature changes,when temperature goes up or down,there’s exponentially weighted average,just adapts more slowly when beta is so large.



我们来试试别的,将 β 设置成接近 1 的一个值,比如 0.98 ,如果计算1/(1- 0.98 ),答案是 50 ,这就是粗略平均了一下,过去 50 天的温度,这时作图可以得到绿线,这个高值 β 要注意几点,你得到的曲线要平坦一些 原因在于,你多平均了几天的温度,所以这个曲线,波动更小 更加平坦,缺点是曲线进一步右移,因为现在平均的温度值更多,要平均更多的值,指数加权平均公式,在温度变化时 适应地更缓慢一些,所以会出现一定延迟,因为当 β 等于 0.98 相当于,给前一天地值加了太多权重,只有 0.02 的权重给了当日的值,所以温度变化时,温度上下起伏,当 β 较大时,指数加权平均值适应地更慢一些。

Now, let’s try another value.If you set beta to another extreme,let’s say it is 0.5 ,then this by the formula we have on the right.This is something like averaging over just two days temperature,and you plot that you get this yellow line.And by averaging only over two days temperature,you have a much, as if you’re averaging over much shorter window.So, you’re much more noisy,much more susceptible to outliers.But this adapts much more quickly to what the temperature changes.So, this formula is highly implemented, exponentially weighted average.Again, it’s called an exponentially weighted,moving average in the statistics literature.We’re going to call it exponentially weighted average for short andby varying this parameter,or later we’ll see such a hyper parameter if you’re learning algorithm,you can get slightly different effectsand there will usually be some value in between that works best.That gives you the red curve which you know maybe looks likebetter average of the temperature are either the green or the yellow curve.You now know the basics of how to compute exponentially weighted averages.In the next video, let’s get a bit more intuition about what it’s doing.



我们可以再换一个值试一试,如果 β 是另一个极端值,比如说 0.5 ,根据右边公式,这是平均了两天的温度,作图运行后得到黄线,由于仅平均了两天的温度,平均的数据太少,所以得到的曲线有更多的噪声,更有可能出现异常值,但是这个曲线能够更快适应温度变化,所以指数加权平均数经常被使用,再说一次 它在统计学中被称为,指数加权移动平均值,我们就简称为指数加权平均数,通过调整这个参数,或者说后面的算法学习你会发现这是一个很重要的参数,可以取得稍微不同的效果,往往中间有某个值效果最好, β 为中间值时得到的红色曲线,比起绿线和黄线更好地平均了温度,现在你知道计算指数加权平均数的基本原理,下一个视频中 我们再聊聊它的本质作用。

重点总结:

指数加权平均

指数加权平均的关键函数:

vt=βvt−1+(1−β)θt

下图是一个关于天数和温度的散点图:



当 β=0.9 时,指数加权平均最后的结果如图中红色线所示;

当 β=0.98 时,指数加权平均最后的结果如图中绿色线所示;

当 β=0.5 时,指数加权平均最后的结果如下图中黄色线所示;



参考文献:

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(2-2)– 优化算法

PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  指数加权平均