Sigmoid函数的理解
2017-05-15 14:59
106 查看
原文地址:http://computing.dcu.ie/~humphrys/Notes/Neural/sigmoid.html
---
Given Summed Input:
x =
Instead of threshold, and fire/not fire,we could have continuous output y according to the sigmoid function:
Note e and its properties. As
x goes to minus infinity, y goes to 0 (tends not to fire). As x goes to infinity, y goes to 1 (tends to fire): At x=0, y=1/2
(more linear)?
For any non-zero w, no matter how close to 0, ς(wx) will eventually be asymptotic to the lines y=0 and y=1.
Is this linear? Let's change the scale:
This is exactly same function.
So it's not actually linear, but note that within the range -6 to 6 we can approximate a linear function with slope.
If x will always be within that range then for all practical purposes we have linear output with slope.
Or try this:
Is this linear? Let's change the scale:
This is exactly same function.
So we can always get, within that range, an approximation of
many different linear functions with slope.
e.g. Given x will be from -30 to 30:
Approximation of any linear function so long as y stays in [0,1]
And centred on zero. To centre other than zero see below.
is to set w=0, then y = constant 1/2 for all x.
y = ς(x-t)
where t is the threshold for this node. This threshold value is something that is learnt, along with the weights.
The "threshold" is now the centre point of the curve, rather than an all-or-nothing value.
Can be linear constant y=c, c between 0 and 1. We already saw y=1/2. Can we have other y=c?
By setting a=0, y=ς(b) constant for all x By varying b, we can have constant output y=c for any c between 0 and 1.
d/dx (fg) = f (dg/dx) + g (df/dx)
Quotient Rule:
d/dx (f/g) = ( g (df/dx) - f (dg/dx) ) / g2
The slope is greatest where? And least where?
To prove this, take the next derivative
and look for where it equals 0:
d/dy ( y (1-y) )
= y (-1) + (1-y) 1
= -y + 1 -y
= 1 - 2y
= 0 for y = 1/2
This is a maximum. There is no minimum.
y = ς(ax+b)
a positive or negative, fraction or multiple
b positive or negative
y = ς(z) where z = ax+b
dy/dx = dy/dz dz/dx
= y(1-y) a
if a positive, all slopes are positive, steepest slope (highest positive slope) is at y = 1/2
if a negative, all slopes are negative, steepest slope (lowest negative slope) is at y = 1/2
i.e. Slope is different value, but still steepest at y = 1/2
---
Continuous Output - The sigmoid function
Given Summed Input:
x =
Instead of threshold, and fire/not fire,we could have continuous output y according to the sigmoid function:
Note e and its properties. As
x goes to minus infinity, y goes to 0 (tends not to fire). As x goes to infinity, y goes to 1 (tends to fire): At x=0, y=1/2
More threshold-like
We can make this more and more threshold-like, or step-like, by increasing the weights on the links, and so increasing the summed input:More linear
Q. How do we make it less step-like(more linear)?
For any non-zero w, no matter how close to 0, ς(wx) will eventually be asymptotic to the lines y=0 and y=1.
Is this linear? Let's change the scale:
This is exactly same function.
So it's not actually linear, but note that within the range -6 to 6 we can approximate a linear function with slope.
If x will always be within that range then for all practical purposes we have linear output with slope.
Or try this:
Is this linear? Let's change the scale:
This is exactly same function.
Approximation of Linear with slope
In practice, x will always be within some range.So we can always get, within that range, an approximation of
many different linear functions with slope.
e.g. Given x will be from -30 to 30:
Approximation of any linear function so long as y stays in [0,1]
And centred on zero. To centre other than zero see below.
Linear y=1/2
The only way we can make ς(wx) exactly linearis to set w=0, then y = constant 1/2 for all x.
Change sign
We can also, by changing the sign of the weights, make large positive actual input lead to large negative summed input and hence no fire, and large negative actual input lead to fire.Not centred on zero
This is of course a threshold-like function still centred on zero. To centre it on any threshold we use:y = ς(x-t)
where t is the threshold for this node. This threshold value is something that is learnt, along with the weights.
The "threshold" is now the centre point of the curve, rather than an all-or-nothing value.
ς(ax+b)
General case: use ς(ax+b)Can we have linear output?
Can y be linear? Not if it has slope. Must stay between 0 and 1.Can be linear constant y=c, c between 0 and 1. We already saw y=1/2. Can we have other y=c?
By setting a=0, y=ς(b) constant for all x By varying b, we can have constant output y=c for any c between 0 and 1.
Reminder - differentiation rules
Product Rule:d/dx (fg) = f (dg/dx) + g (df/dx)
Quotient Rule:
d/dx (f/g) = ( g (df/dx) - f (dg/dx) ) / g2
Properties of the sigmoid function
Max/min value of slope
Slope = y (1-y)The slope is greatest where? And least where?
To prove this, take the next derivative
and look for where it equals 0:
d/dy ( y (1-y) )
= y (-1) + (1-y) 1
= -y + 1 -y
= 1 - 2y
= 0 for y = 1/2
This is a maximum. There is no minimum.
Slope of ς(ax+b)
For the general case:y = ς(ax+b)
a positive or negative, fraction or multiple
b positive or negative
y = ς(z) where z = ax+b
dy/dx = dy/dz dz/dx
= y(1-y) a
if a positive, all slopes are positive, steepest slope (highest positive slope) is at y = 1/2
if a negative, all slopes are negative, steepest slope (lowest negative slope) is at y = 1/2
i.e. Slope is different value, but still steepest at y = 1/2
相关文章推荐
- 全面理解Java中的String数据类型
- 我对作家的理解
- Silverlight实例教程 - 理解Navigation导航框架Frame类 推荐
- 转)深入理解HTTP消息头
- 【转载】深入理解Java多态性
- 通过Linux理解操作系统(三):进程管理(下)
- Android中BindService方式使用的理解
- Android事件分发机制完全解析,带你从源码的角度彻底理解(上)
- JdbcTemplate与事务(容易理解)
- fflush函数的深入理解
- HOG特征-理解篇
- Device Context 设备环境 设备上下文 理解
- 深入理解Java 8 Lambda(语言篇——lambda,方法引用,目标类型和默认方法)
- 深入理解Android之Gradle
- 怎样理解Servlet的单实例多线程
- 深入理解JS异步编程五(脚本异步加载)
- Linux命令行–理解Linux文件权限(转)
- 深入理解Java:注解(Annotation)自定义注解
- 关于java接口和抽象类的理解
- 深刻理解Python中的元类(metaclass)