您的位置:首页 > 其它

tf.clip_by_norm理解

2017-08-14 10:53 1541 查看


clip_by_norm

这里的clip_by_norm是指对梯度进行裁剪,通过控制梯度的最大范式,防止梯度爆炸的问题,是一种比较常用的梯度规约的方式。


tensorflow中的clip_by_norm


示例

optimizer = tf.train.AdamOptimizer(learning_rate, beta1=0.5)
grads = optimizer.compute_gradients(cost)
for i, (g, v) in enumerate(grads):
if g is not None:
grads[i] = (tf.clip_by_norm(g, 5), v)  # clip gradients
train_op = optimizer.apply_gradients(grads)


上面是一段比较通用的定义梯度计算公式的代码,其中用到了
tf.clip_by_norm
这个方法,下面是该函数的源码:
def clip_by_norm(t, clip_norm, axes=None, name=None):
"""Clips tensor values to a maximum L2-norm.

Given a tensor `t`, and a maximum clip value `clip_norm`, this operation
normalizes `t` so that its L2-norm is less than or equal to `clip_norm`,
along the dimensions given in `axes`. Specifically, in the default case
where all dimensions are used for calculation, if the L2-norm of `t` is
already less than or equal to `clip_norm`, then `t` is not modified. If
the L2-norm is greater than `clip_norm`, then this operation returns a
tensor of the same type and shape as `t` with its values set to:

`t * clip_norm / l2norm(t)`

In this case, the L2-norm of the output tensor is `clip_norm`.

As another example, if `t` is a matrix and `axes == [1]`, then each row
of the output will have L2-norm equal to `clip_norm`. If `axes == [0]`
instead, each column of the output will be clipped.

This operation is typically used to clip gradients before applying them with
an optimizer.

Args:
t: A `Tensor`.
clip_norm: A 0-D (scalar) `Tensor` > 0. A maximum clipping value.
axes: A 1-D (vector) `Tensor` of type int32 containing the dimensions
to use for computing the L2-norm. If `None` (the default), uses all
dimensions.
name: A name for the operation (optional).

Returns:
A clipped `Tensor`.
"""
with ops.name_scope(name, "clip_by_norm", [t, clip_norm]) as name:
t = ops.convert_to_tensor(t, name="t")

# Calculate L2-norm, clip elements by ratio of clip_norm to L2-norm
l2norm_inv = math_ops.rsqrt(
math_ops.reduce_sum(t * t, axes, keep_dims=True))
tclip = array_ops.identity(t * clip_norm * math_ops.minimum(
l2norm_inv, constant_op.constant(1.0, dtype=t.dtype) / clip_norm),
name=name)

return tclip


通过注解可以清晰的明白其作用在于将传入的梯度张量
t
的L2范数进行了上限约束,约束值即为
clip_norm
,如果
t
的L2范数超过了
clip_norm
,则变换为
t
* clip_norm / l2norm(t)
,如此一来,变换后的
t
的L2范数便小于等于
clip_norm
了。


示例

下面我们通过一段代码来直观地展示该函数的作用。


生成随机数

import numpy as np
t = np.random.randint(low=0,high=5,size=10)
t

array([1, 1, 3, 4, 2, 2, 1, 4, 2, 3])


计算L2范数

l2norm4t = np.linalg.norm(t)
l2norm4t

8.0622577482985491


随机数规约

clip_norm = 5
transformed_t = t *clip_norm/l2norm4t
transformed_t

array([ 0.62017367,  0.62017367,  1.86052102,  2.48069469,  1.24034735,
1.24034735,  0.62017367,  2.48069469,  1.24034735,  1.86052102])


验证

np.linalg.norm(transformed_t)

5.0


可以看出,该随机数序列的L2范数已经被规约为
clip_norm
的值。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: