您的位置：首页 > 其它

3.2 梯度下降法学习——梯度下降法的模拟及问题改进

2018-03-13 17:53 806 查看

梯度下降法的过程：
随机选择一个初始点theta
重复：求损失函数导数决定梯度下降的方向，导数的负方向
选择合适的步长（学习率）
更新theta值，向损失函数减小方向移动

如果满足条件（到了最小值的点）break跳出
具体实现

#损失函数
def J(theta):
return (theta-2.5)**2-1
#损失函数的导数计算
def dJ(theta):
return 2*(theta-2.5)
#梯度下降法过程模拟
def gradient_descent(intital_theta,eta,n_iters=1e3,epsilon=1e-8):
theta=intital_theta
theta_history.append(intital_theta)
i_iter=0
while i_iter<n_iters:
gradient=dJ(theta)
last_theta=theta
theta=theta-eta*gradient
theta_history.append(theta)
if(abs(J(theta)-J(last_theta)) < epsilon):
break
i_iter+=1
#图形绘制
def plot_theta_history():
plt.plot(plot_x,J(plot_x))
plt.plot(np.array(theta_history),J(np.array(theta_history)),color="r",marker="+")
plt.show()

运行eta=0.1
theta_history=[]
gradient_descent(0.0,eta)
plot_theta_history()绘图结果

需要注意的问题1、最小值确定
理论上最小值的点对应的是导数为0的点，但是有可能步长的选择的不合适或者求导中浮点精度的问题，使得最小值的点取不到导数正好等于0的点，而且计算机计算浮点数有误差，可能永远达不到想要的精度，该如何确定到了最小值的点？我们在求梯度下降，所以理论每一次求得的损失函数都要比上一次损失函数的值要小，如果前后两次函数差值很小了已经达到了我们的精度要求，就认为基本上已经到最低点了。
2、步长的选择
步长太小的情况

步长太大：

而太大时还会报错，所以将损失函数进行异常处理，不会因为报错而中断def J(theta):#增加异常检测就不会因为太大而中断
try:
return (theta-2.5)**2-1
except:
return float('inf')3、出现死循环，如何避免
但是无穷大减去无穷大结果为nan，将变成死循环
梯度下降模拟的原代码def gradient_descent(intital_theta,eta,epsilon=1e-8):
theta=intital_theta
theta_history.append(intital_theta)
while True:
gradient=dJ(theta)
last_theta=theta
theta=theta-eta*gradient
theta_history.append(theta)
if(abs(J(theta)-J(last_theta)) < epsilon):
break
改进方法：设置循环次数执行完跳出def gradient_descent(intital_theta,eta,n_iters=1e3,epsilon=1e-8):
theta=intital_theta
theta_history.append(intital_theta)
i_iter=0
while i_iter<n_iters:
gradient=dJ(theta)
last_theta=theta
theta=theta-eta*gradient
theta_history.append(theta)
if(abs(J(theta)-J(last_theta)) < epsilon):
break
i_iter+=1

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航