The Max Trick when Computing Softmax
2016-06-12 11:16
405 查看
The softmax function appears in many machine learning algorithms. The idea is, if you have a set of values, to scale them so they sum to 1.0 and therefore can be interpreted as probabilities.
For example, suppose you have three values, (x0, x1, x2) = (3.0, 5.0, 2.0). The softmax function for any value xj expressed mathematically is:
SoftmaxEquation
![](https://jamesmccaffrey.files.wordpress.com/2016/03/softmaxequation.jpg)
In words, find the sum of e raised to each x value. The softmax for a particular x is e raised to x divided by the sum. So:
And the softmax values are:
Notice the softmax values sum to 1.0. In practice, calculating softmax values can go wrong if an x value is very large — the exp() of a large number can be huge, which makes the sum huge, and dividing by a huge number can cause arithmetic computation problems.
A trick to avoid this computation problem is subtract the largest x value from each x value. It turns out that the properties of the exp() function give you the same resuilt but you avoid large numbers.
For (3.0, 5.0, 2.0), the largest value is 5.0. Subtracting 5.0 from each gives (-2.0 0.0, -3.0), and so:
And then:
which are the same softmax values as when computed directly.
For example, suppose you have three values, (x0, x1, x2) = (3.0, 5.0, 2.0). The softmax function for any value xj expressed mathematically is:
SoftmaxEquation
![](https://jamesmccaffrey.files.wordpress.com/2016/03/softmaxequation.jpg)
In words, find the sum of e raised to each x value. The softmax for a particular x is e raised to x divided by the sum. So:
exp(3.0) = 20.0855 exp(5.0) = 148.4132 exp(2.0) = 7.3891 sum = 175.8878
And the softmax values are:
s(3.0) = 20.0855 / 175.8878 = 0.12 s(5.0) = 148.4132 / 175.8878 = 0.84 s(2.0) = 7.3891 / 175.8878 = 0.04
Notice the softmax values sum to 1.0. In practice, calculating softmax values can go wrong if an x value is very large — the exp() of a large number can be huge, which makes the sum huge, and dividing by a huge number can cause arithmetic computation problems.
A trick to avoid this computation problem is subtract the largest x value from each x value. It turns out that the properties of the exp() function give you the same resuilt but you avoid large numbers.
For (3.0, 5.0, 2.0), the largest value is 5.0. Subtracting 5.0 from each gives (-2.0 0.0, -3.0), and so:
exp(-2.0) = 0.1353 exp(0.0) = 1.0000 exp(-3.0) = 0.0498 sum = 1.1852
And then:
s(3.0) = 0.1353 / 1.1852 = 0.12 s(5.0) = 1.0000 / 1.1852 = 0.84 s(2.0) = 0.0498 / 1.1852 = 0.04
which are the same softmax values as when computed directly.
相关文章推荐
- 用Python从零实现贝叶斯分类器的机器学习的教程
- My Machine Learning
- 机器学习---学习首页 3ff0
- Spark机器学习(一) -- Machine Learning Library (MLlib)
- 反向传播(Backpropagation)算法的数学原理
- 关于SVM的那点破事
- 也谈 机器学习到底有没有用 ?
- TensorFlow人工智能引擎入门教程之九 RNN/LSTM循环神经网络长短期记忆网络使用
- TensorFlow人工智能引擎入门教程之十 最强网络 RSNN深度残差网络 平均准确率96-99%
- TensorFlow人工智能引擎入门教程所有目录
- 如何用70行代码实现深度神经网络算法
- 量子计算机编程原理简介 和 机器学习
- 近200篇机器学习&深度学习资料分享(含各种文档,视频,源码等)
- 已经证实提高机器学习模型准确率的八大方法
- 初识机器学习算法有哪些?
- 机器学习相关的库和工具
- 10个关于人工智能和机器学习的有趣开源项目
- 机器学习实践中应避免的7种常见错误
- 机器学习常见的算法面试题总结
- 不平衡数据处理技术——RUSBoost