softmax回归梯度计算及其与logistic回归关系
2018-01-22 21:55
330 查看
在softmax回归中,假设我们的训练集由m个已标记样本组成:\[\{ ({x^{(1)}},{y^{(1)}}),...,({x^{(m)}},{y^{(m)}})\} \]且激活函数为softmax函数:\[p({y^{(i)}} = j|{x^{(i)}};\theta ) = \frac{{{e^{ - {\theta _j}^T{x^{(i)}}}}}}{{\sum\limits_{l = 1}^k {{e^{ - {\theta _l}^T{x^{(i)}}}}} }}\]损失函数为:\[J(\theta
) = - \frac{1}{m}\sum\limits_{i,j = 1}^m {[I({y^{(i)}} = j)logp({y^{(i)}} = j|{x^{(i)}};\theta )]} \]其中,\[{I({y^{(i)}} = j)}\]为示性函数
这里,损失函数对参数的梯度的第t个分量应该分为两种情况考虑(因为待求的分量t可能与softmax函数分子中的(第j个)参数一致,也可能不一致):
t = j 时:\[\begin{gathered}
{\nabla _{{\theta _t}}}J(\theta ) &= & - \frac{1}{m}\sum\limits_{i = 1}^m {[\frac{1}{{p({y^{(i)}} = j|{x^{(i)}};\theta )}} \cdot } p({y^{(i)}} = j|{x^{(i)}};\theta ) \cdot (1 - p({y^{(i)}} = j|{x^{(i)}};\theta )) \cdot {x^{(i)}}] \\
&=& - \frac{1}{m}\sum\limits_{i = 1}^m {[1 - p({y^{(i)}} = j|{x^{(i)}};\theta )] \cdot {x^{(i)}}} \\
\end{gathered} \]
t ≠ j 时: \[\begin{gathered}
{\nabla _{{\theta _t}}}J(\theta ) &=& - \frac{1}{m}\sum\limits_{i = 1}^m {[\frac{1}{{p({y^{(i)}} = j|{x^{(i)}};\theta )}} \cdot } \frac{{0 \cdot \left( {\sum\limits_{l = 1}^k {{e^{ - {\theta _l}^T{x^{(i)}}}}} } \right) - {e^{ - {\theta _l}^T{x^{(i)}}}} \cdot
{e^{ - {\theta _l}^T{x^{(i)}}}} \cdot {x^{(i)}}}}{{{{(\sum\limits_{l = 1}^k {{e^{ - {\theta _l}^T{x^{(i)}}}}} )}^2}}}] \\
&= & - \frac{1}{m}\sum\limits_{i = 1}^m {[ - p({y^{(i)}} = j|{x^{(i)}};\theta )] \cdot {x^{(i)}}} \\
\end{gathered} \]
总结上面两种情况可以得到(这里重新表示为第j个参数的梯度):\[{\nabla _{{\theta _j}}}J(\theta ) = - \frac{1}{m}\sum\limits_{i = 1}^m {[I({y^{(i)}} = j) - p({y^{(i)}} = j|{x^{(i)}};\theta )] \cdot {x^{(i)}}} \]
梯度结果中,前面一项为误差项,后面一项为输入项,符合δ准则。并且容易看出,softmax激活函数和logistic激活函数最后求出来的梯度具有完全相同的形式。
重新将logistic函数和softmax函数表示为:\[\begin{gathered}
{f_1}(z) = \frac{1}{{1 + {e^{ - z}}}} \\
{f_2}(z) = \frac{{{e^{ - z}}}}{{\sum\limits_{i = 1}^m {{e^{ - {z_i}}}} }} \\
\end{gathered} \]
分别对它们求导:\[\begin{gathered}
{f_1}^{'}(z) &=& - \frac{{{e^{ - z}} \cdot ( - 1)}}{{{{(1 + {e^{ - z}})}^2}}} = \frac{{({e^{ - z}} + 1) - 1}}{{{{(1 + {e^{ - z}})}^2}}} \\
&=& {f_1}(z) - {f_1}^2(z) = {f_1}(z) \cdot (1 - {f_1}(z)) \\
\end{gathered} \]
\[\begin{gathered}
{f_2}^{'}(z) &=& - \frac{{{e^{ - z}} \cdot ( - 1) \cdot (\sum\limits_{i = 1}^m {{e^{ - {z_i}}}} ) - {e^{ - z}} \cdot ( - 1) \cdot {e^{ - z}}}}{{{{(\sum\limits_{i = 1}^m {{e^{ - {z_i}}}} )}^2}}} \\
&=& {f_2}(z) - {f_2}^2(z) = {f_2}(z) \cdot (1 - {f_2}(z)) \\
\end{gathered} \]
易看出,logistic函数和softmax函数的求导形式也完全一样。
根据以上分析可以看出,softmax激活函数是logistic激活函数在多类别问题上的推广(这部分容易理解,这里不做更多说明),并且两者具有完全相同的梯度形式和求导形式。
) = - \frac{1}{m}\sum\limits_{i,j = 1}^m {[I({y^{(i)}} = j)logp({y^{(i)}} = j|{x^{(i)}};\theta )]} \]其中,\[{I({y^{(i)}} = j)}\]为示性函数
这里,损失函数对参数的梯度的第t个分量应该分为两种情况考虑(因为待求的分量t可能与softmax函数分子中的(第j个)参数一致,也可能不一致):
t = j 时:\[\begin{gathered}
{\nabla _{{\theta _t}}}J(\theta ) &= & - \frac{1}{m}\sum\limits_{i = 1}^m {[\frac{1}{{p({y^{(i)}} = j|{x^{(i)}};\theta )}} \cdot } p({y^{(i)}} = j|{x^{(i)}};\theta ) \cdot (1 - p({y^{(i)}} = j|{x^{(i)}};\theta )) \cdot {x^{(i)}}] \\
&=& - \frac{1}{m}\sum\limits_{i = 1}^m {[1 - p({y^{(i)}} = j|{x^{(i)}};\theta )] \cdot {x^{(i)}}} \\
\end{gathered} \]
t ≠ j 时: \[\begin{gathered}
{\nabla _{{\theta _t}}}J(\theta ) &=& - \frac{1}{m}\sum\limits_{i = 1}^m {[\frac{1}{{p({y^{(i)}} = j|{x^{(i)}};\theta )}} \cdot } \frac{{0 \cdot \left( {\sum\limits_{l = 1}^k {{e^{ - {\theta _l}^T{x^{(i)}}}}} } \right) - {e^{ - {\theta _l}^T{x^{(i)}}}} \cdot
{e^{ - {\theta _l}^T{x^{(i)}}}} \cdot {x^{(i)}}}}{{{{(\sum\limits_{l = 1}^k {{e^{ - {\theta _l}^T{x^{(i)}}}}} )}^2}}}] \\
&= & - \frac{1}{m}\sum\limits_{i = 1}^m {[ - p({y^{(i)}} = j|{x^{(i)}};\theta )] \cdot {x^{(i)}}} \\
\end{gathered} \]
总结上面两种情况可以得到(这里重新表示为第j个参数的梯度):\[{\nabla _{{\theta _j}}}J(\theta ) = - \frac{1}{m}\sum\limits_{i = 1}^m {[I({y^{(i)}} = j) - p({y^{(i)}} = j|{x^{(i)}};\theta )] \cdot {x^{(i)}}} \]
梯度结果中,前面一项为误差项,后面一项为输入项,符合δ准则。并且容易看出,softmax激活函数和logistic激活函数最后求出来的梯度具有完全相同的形式。
重新将logistic函数和softmax函数表示为:\[\begin{gathered}
{f_1}(z) = \frac{1}{{1 + {e^{ - z}}}} \\
{f_2}(z) = \frac{{{e^{ - z}}}}{{\sum\limits_{i = 1}^m {{e^{ - {z_i}}}} }} \\
\end{gathered} \]
分别对它们求导:\[\begin{gathered}
{f_1}^{'}(z) &=& - \frac{{{e^{ - z}} \cdot ( - 1)}}{{{{(1 + {e^{ - z}})}^2}}} = \frac{{({e^{ - z}} + 1) - 1}}{{{{(1 + {e^{ - z}})}^2}}} \\
&=& {f_1}(z) - {f_1}^2(z) = {f_1}(z) \cdot (1 - {f_1}(z)) \\
\end{gathered} \]
\[\begin{gathered}
{f_2}^{'}(z) &=& - \frac{{{e^{ - z}} \cdot ( - 1) \cdot (\sum\limits_{i = 1}^m {{e^{ - {z_i}}}} ) - {e^{ - z}} \cdot ( - 1) \cdot {e^{ - z}}}}{{{{(\sum\limits_{i = 1}^m {{e^{ - {z_i}}}} )}^2}}} \\
&=& {f_2}(z) - {f_2}^2(z) = {f_2}(z) \cdot (1 - {f_2}(z)) \\
\end{gathered} \]
易看出,logistic函数和softmax函数的求导形式也完全一样。
根据以上分析可以看出,softmax激活函数是logistic激活函数在多类别问题上的推广(这部分容易理解,这里不做更多说明),并且两者具有完全相同的梯度形式和求导形式。
相关文章推荐
- 机器学习 —— 基础整理(五)线性回归;二项Logistic回归;Softmax回归及其梯度推导;广义线性模型
- logistic回归和softmax回归
- AUC(Area Under roc Curve )计算及其与ROC的关系
- Machine Learning - WEEK 1 2 3- 线性回归 、逻辑回归、梯度下降法及其优化算法、传统方法、 Octave 入门
- MLiA 逻辑回归 求解回归函数的系数中梯度下降法及其向量化
- AUC(Area Under roc Curve )计算及其与ROC的关系
- 详解PV、UV、VV、IP及其关系与计算
- 【离散数学实验】关系R的幂运算及其传递闭包的计算
- 1.线性回归、Logistic回归、Softmax回归
- AUC(Area Under roc Curve )计算及其与ROC的关系
- 《机器学习实战》第五章:logistic回归+梯度计算 笔记
- 梯度下降 && 线性回归 && 逻辑回归 && softmax
- AUC(Area Under roc Curve )计算及其与ROC的关系
- CS231n作业笔记1.5:Softmax的误差以及梯度计算
- softmax 损失函数以及梯度推导计算
- 多类 SVM 的损失函数及其梯度计算
- 梯度下降法实现softmax回归MATLAB程序
- 机器学习(2):Softmax回归原理及其实现
- logistic回归的梯度计算
- AUC(Area Under roc Curve )计算及其与ROC的关系