您的位置：首页 > 其它

深度之眼Pytorch框架训练营第四期——损失函数

2020-06-04 07:55 591 查看

文章目录

损失函数
1、损失函数概念
（1）概述
（2）`PyTorch`中的Loss

2、交叉熵损失函数

3、NLL/BCE/BCEWithLogits Loss

（1）`nn.NLLLoss`
（2）`nn.BCELoss`
（3）`nn.BCEWithLogitsLoss`

4、其余14种损失函数介绍

（1）`nn.L1Loss`
（2）`nn.MSELoss`
（3）`SmoothL1Loss`
（4）`PoissonNLLLoss`
（5）`nn.KLDivLoss`
（6）`nn.MarginRankingLoss`
（7）`nn.MultiLabelMarginLoss`
（8）`nn.SoftMarginLoss`
（9）`nn.MultiLabelSoftMarginLoss`
（10）`nn.MultiMarginLoss`
（11）`nn.TripletMarginLoss`
（12）`nn.HingeEmbeddingLoss`
（13）`nn.CosineEmbeddingLoss`
（14）`nn.CTCLoss`

5、损失函数总结

损失函数

1、损失函数概念

（1）概述

损失函数用于衡量模型输出与真实标签的差异，需要注意区分损失函数，代价函数，目标函数三者的区别：

损失函数：Loss=f(y^,y)Loss = f(\hat{y},y)Loss=f(y^,y)
代价函数：Cost=1N∑i=1Nf(y^i,yi)Cost = \frac{1}{N}\sum_{i=1}^{N}f(\hat{y}_i,y_i)Cost=N1∑i=1Nf(y^i,yi)
目标函数：Obj=Cost+RegulationObj = Cost + RegulationObj=Cost+Regulation

（2）

PyTorch

中的Loss

class _Loss(Module):
def __init__(self, size_average=None, reduce=None,reduction='mean'):
super(_Loss, self).__init__()
if size_average is not None or reduce is not None:
self.reduction = _Reduction.legacy_get_string(size_average, reduce)
else:
self.reduction = reduction

从上面的代码中可以看出，

PyTorch

的

_Loss

类是继承的

Module

，因此可以把

Loss

视为一个网络层，这里需要注意的是

size_average

和

reduce

这两个参数之后会把舍弃，其功能完全由

reduction

参数替代

2、交叉熵损失函数

交叉熵等于信息熵加上相对熵：H(P,Q)=−∑i=1NP(xi)log⁡Q(xi)H(P,Q) = -\sum_{i=1}^{N}P(x_i)\log Q(x_i)H(P,Q)=−∑i=1NP(xi)logQ(xi)

PyTorch

实现：

nn.CrossEntropyLoss

：

nn.CrossEntropyLoss(weight=None,
size_average=None,
ignore_index=-100,
reduce=None,
reduction='mean')

功能：
```
nn.LogSoftmax()
```
与
```
nn.NLLLoss()
```
结合，进行交叉熵计算
主要参数：
```
weight
```
：各类别的loss设置权值
```
ignore_index
```
：忽略某个类别
```
reduction
```
：计算模式，可为
```
none/sum/mean
```
，
```
"none"
```
表示逐个元素计算，
```
"sum"
```
表示所有元素求和，返回标量，
```
"mean"
```
表示加权平均，返回标量
计算公式：
loss⁡(x, class )=−log⁡(exp⁡(x[ class ])∑jexp⁡(x[j]))=−x[ class ]+log⁡(∑jexp⁡(x[j])) \operatorname{loss}(x, \text { class })=-\log \left(\frac{\exp (x[\text { class }])}{\sum_{j} \exp (x[j])}\right)=-x[\text { class }]+\log \left(\sum_{j} \exp (x[j])\right) loss(x, class )=−log(∑jexp(x[j])exp(x[ class ]))=−x[ class ]+log(j∑exp(x[j]))

loss⁡(x, class )=weight⁡[ class ](−x[ class ]+log⁡(∑jexp⁡(x[j]))) \operatorname{loss}(x, \text { class })=\operatorname{weight}[\text { class }]\left(-x[\text { class }]+\log \left(\sum_{j} \exp (x[\mathrm{j}])\right)\right) loss(x, class )=weight[ class ](−x[ class ]+log(j∑exp(x[j])))

实例：

# def loss function
loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

# forward
loss_none = loss_f_none(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)

# Cross Entropy Loss:
#   tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)

手动计算验证：

idx = 0

input_1 = inputs.detach().numpy()[idx]      # [1, 2]
target_1 = target.numpy()[idx]              # [0]

# 第一项
x_class = input_1[target_1]

# 第二项
sigma_exp_x = np.sum(list(map(np.exp, input_1)))
log_sigma_exp_x = np.log(sigma_exp_x)

# 输出loss
loss_1 = -x_class + log_sigma_exp_x

print("第一个样本loss为: ", loss_1)

# 第一个样本loss为:  1.3132617

如果带

weight

参数：

# def loss function
weights = torch.tensor([1, 2], dtype=torch.float)
# weights = torch.tensor([0.7, 0.3], dtype=torch.float)

loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("\nweights: ", weights)
print(loss_none_w, loss_sum, loss_mean)
# weights:  tensor([1., 2.])
# tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)

这里注意：mean模式下是除以总权值份数，即1+2+2=5而非样本个数3

小结：交叉熵损失函数多用于分类问题，共有三种模式可以选择

3、NLL/BCE/BCEWithLogits Loss

（1）

nn.NLLLoss

nn.NLLLoss(weight=None,
size_average=None,
ignore_index=-100,
reduce=None,
reduction='mean')

功能：实现负对数似然函数中的负号功能（仅仅是取负号，不要被名称迷惑）
计算公式：
ℓ(x,y)=L={l1,…,lN}′,ln=−wynxn,yn \ell(x, y)=L=\left\{l_{1}, \dots, l_{N}\right\}^{\prime}, \quad l_{n}=-w_{y_{n}} x_{n, y_{n}} ℓ(x,y)=L={l1,…,lN}′,ln=−wynxn,yn
主要参数：
```
weight
```
：各类别的loss设置权值
```
ignore_index
```
：忽略某个类别
```
reduction
```
：计算模式，可为
```
none/sum/mean
```
，
```
"none"
```
表示逐个元素计算，
```
"sum"
```
表示所有元素求和，返回标量，
```
"mean"
```
表示加权平均，返回标量
实例：

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.NLLLoss(weight=weights, reduction='none')
loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("\nweights: ", weights)
print("NLL Loss", loss_none_w, loss_sum, loss_mean)

# weights:  tensor([1., 1.])
# NLL Loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

这里的结果

[-1,-3,-3]

是这样计算的，由于第一个样本类别为0，因此

[1,2]

中取第一个元素并取负，因此为-1，第二个样本和第三个样本类别为1，因此

[1,3]

中取第二个元素并取负，因此为-3

（2）

nn.BCELoss

nn.BCELoss(weight=None,
size_average=None,
reduce=None,
reduction='mean')

功能：二分类交叉熵
计算公式：（注意输入值取值在[0,1][0,1][0,1]）
ln=−wn[yn⋅log⁡xn+(1−yn)⋅log⁡(1−xn)] l_{n}=-w_{n}\left[y_{n} \cdot \log x_{n}+\left(1-y_{n}\right) \cdot \log \left(1-x_{n}\right)\right] ln=−wn[yn⋅logxn+(1−yn)⋅log(1−xn)]
主要参数：
```
weight
```
：各类别的loss设置权值
```
ignore_index
```
：忽略某个类别
```
reduction
```
：计算模式，可为
```
none/sum/mean
```
，
```
"none"
```
表示逐个元素计算，
```
"sum"
```
表示所有元素求和，返回标量，
```
"mean"
```
表示加权平均，返回标量
实例：

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

target_bce = target

# itarget
inputs = torch.sigmoid(inputs)

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')
loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\nweights: ", weights)
print("BCE Loss", loss_none_w, loss_sum, loss_mean)

# weights:  tensor([1., 1.])
# BCE Loss tensor([[0.3133, 2.1269],
#         [0.1269, 2.1269],
#         [3.0486, 0.0181],
#         [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

从上面的代码可以看出，一共得到了8个值，充分说明了逐个计算的含义，下面手动计算验证：

idx = 0

x_i = inputs.detach().numpy()[idx, idx]
y_i = target.numpy()[idx, idx]              #

# loss
# l_i = -[ y_i * np.log(x_i) + (1-y_i) * np.log(1-y_i) ]      # np.log(0) = nan
l_i = -y_i * np.log(x_i) if y_i else -(1-y_i) * np.log(1-x_i)

# 输出loss
print("BCE inputs: ", inputs)
print("第一个loss为: ", l_i)

# 第一个loss为:  0.31326166

（3）

nn.BCEWithLogitsLoss

当不希望神经网络中出现SigmoidSigmoidSigmoid函数，而计算损失函数时需要SigmoidSigmoidSigmoid函数的时候，就可以使用

nn.BCEWithLogitsLoss

nn.BCEWithLogitsLoss(weight=None,
size_average=None,
reduce=None,
reduction='mean',
pos_weight=None)

功能：结合
```
Sigmoid
```
与二分类交叉熵
计算公式：
ln=−wn[yn⋅log⁡σ(xn)+(1−yn)⋅log⁡(1−σ(xn))]σ就是Sigmoid函数 l_{n}=-w_{n}\left[y_{n} \cdot \log \sigma\left(x_{n}\right)+\left(1-y_{n}\right) \cdot \log \left(1-\sigma\left(x_{n}\right)\right)\right] \qquad \sigma\text{就是Sigmoid函数} ln=−wn[yn⋅logσ(xn)+(1−yn)⋅log(1−σ(xn))]σ就是Sigmoid函数
主要参数：
```
pos_weight
```
：正样本的权值，这个参数用于平衡样本权值，例如有100个正样本和300个负样本，则可把这个参数设置为3，这样的话正负样本就相互平衡了
```
weight
```
：各类别的loss设置权值
```
ignore_index
```
：忽略某个类别
```
reduction
```
：计算模式，可为
```
none/sum/mean
```
，
```
"none"
```
表示逐个元素计算，
```
"sum"
```
表示所有元素求和，返回标量，
```
"mean"
```
表示加权平均，返回标量
注意事项：由于这里已经有SigmoidSigmoidSigmoid函数，所以网络最后不能再加
```
Sigmoid
```
函数
实例：

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

target_bce = target

# inputs = torch.sigmoid(inputs)

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none')
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\nweights: ", weights)
print(loss_none_w, loss_sum, loss_mean)

# weights:  tensor([1., 1.])
# tensor([[0.3133, 2.1269],
#         [0.1269, 2.1269],
#         [3.0486, 0.0181],
#         [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

pos_weight

参数：

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

target_bce = target

# itarget
# inputs = torch.sigmoid(inputs)

weights = torch.tensor([1], dtype=torch.float)
pos_w = torch.tensor([3], dtype=torch.float)        # 3

loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none', pos_weight=pos_w)
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum', pos_weight=pos_w)
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean', pos_weight=pos_w)

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\npos_weights: ", pos_w)
print(loss_none_w, loss_sum, loss_mean)

# pos_weights:  tensor([3.])
# tensor([[0.9398, 2.1269],
#         [0.3808, 2.1269],
#         [3.0486, 0.0544],
#         [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)

可以看出，第二段代码加上

pos_weight=3

后，输出结果中，为1的位置均乘上了3

4、其余14种损失函数介绍

（1）

nn.L1Loss

nn.L1Loss(size_average=None,
reduce=None,
reduction='mean’)

功能：计算input和target之差的绝对值，可选返回同维度的张量或者是一个标量
计算公式：
ℓ(x,y)=L={l1,…,lN}⊤,ln=∣xn−yn∣ \ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=\left|x_{n}-y_{n}\right| ℓ(x,y)=L={l1,…,lN}⊤,ln=∣xn−yn∣
参数：
```
reduction
```
：计算模式，可为
```
none/sum/mean
```
，
```
"none"
```
表示逐个元素计算，
```
"sum"
```
表示所有元素求和，返回标量，
```
"mean"
```
表示加权平均，返回标量
实例：

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3

loss_f = nn.L1Loss(reduction='none')
loss = loss_f(inputs, target)

print("input:{}\ntarget:{}\nL1 loss:{}".format(inputs, target, loss))

# input:tensor([[1., 1.],
#         [1., 1.]])
# target:tensor([[3., 3.],
#         [3., 3.]])
# L1 loss:tensor([[2., 2.],
#         [2., 2.]])

（2）

nn.MSELoss

nn.MSELoss(size_average=None,
reduce=None,
reduction='mean’)

功能：计算input和target之差的平方，可选返回同维度的张量或者是一个标量
计算公式：
ℓ(x,y)=L={l1,…,lN}⊤,ln=(xn−yn)2 \ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=\left(x_{n}-y_{n}\right)^2 ℓ(x,y)=L={l1,…,lN}⊤,ln=(xn−yn)2
参数：
```
reduction
```
：计算模式，可为
```
none/sum/mean
```
，
```
"none"
```
4000 表示逐个元素计算，
```
"sum"
```
表示所有元素求和，返回标量，
```
"mean"
```
表示加权平均，返回标量
实例：

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3

loss_f = nn.MSELoss(reduction='none')
loss = loss_f(inputs, target)

print("input:{}\ntarget:{}\nMSE loss:{}".format(inputs, target, loss))

# input:tensor([[1., 1.],
#         [1., 1.]])
# target:tensor([[3., 3.],
#         [3., 3.]])
# MSE loss:tensor([[4., 4.],
#         [4., 4.]])

（3）

SmoothL1Loss

nn.SmoothL1Loss(size_average=None,
reduce=None,
reduction='mean’)

功能：计算平滑L1L1L1损失，属于Huber Loss中的一种（固定参数δ\deltaδ固定为1）
计算公式与补充说明：
计算平滑L1L1L1损失，属于Huber Loss中的一种，Huber Loss 常用于回归问题，其最大的特点是对离群点(outliers)、噪声不敏感，具有较强的鲁棒性，其计算公式为：
Lδ(y,f(x))={12(y−f(x))2 for ∣y−f(x)∣≤δδ∣y−f(x)∣−12δ2 otherwise L_{\delta}(y, f(x))=\left\{\begin{array}{ll} \frac{1}{2}(y-f(x))^{2} & \text { for }|y-f(x)| \leq \delta \\ \delta|y-f(x)|-\frac{1}{2} \delta^{2} & \text { otherwise } \end{array}\right. Lδ(y,f(x))={21(y−f(x))2δ∣y−f(x)∣−21δ2 for ∣y−f(x)∣≤δ otherwise
这里的δ\deltaδ理解为误差控制率，即当误差绝对值小于δ\deltaδ，采用L2L2L2损失；若大于δ\deltaδ，采用L1L1L1损失。如果固定δ=1\delta=1δ=1，就是这里的
```
SmoothL1Loss
```
，其计算公式就变为：
loss⁡(x,y)=1n∑izi \operatorname{loss}(x, y)=\frac{1}{n} \sum_{i} z_{i} loss(x,y)=n1i∑zi
where ziz_{i}zi is given by:
zi={0.5(xi−yi)2, if ∣xi−yi∣<1∣xi−yi∣−0.5, otherwise z_{i}=\left\{\begin{array}{ll} 0.5\left(x_{i}-y_{i}\right)^{2}, & \text { if }\left|x_{i}-y_{i}\right|<1 \\ \left|x_{i}-y_{i}\right|-0.5, & \text { otherwise } \end{array}\right. zi={0.5(xi−yi)2,∣xi−yi∣−0.5, if ∣xi−yi∣<1 otherwise
```
SmoothL1Loss
```
与
```
L1Loss
```
的区别如下图所示：

参数：
```
reduction
```
：计算模式，可为
```
none/sum/mean
```
，
```
"none"
```
表示逐个元素计算，
```
"sum"
```
表示所有元素求和，返回标量，
```
"mean"
```
表示加权平均，返回标量
实例：

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3

loss_f = nn.SmoothL1Loss(reduction='none')
loss = loss_f(inputs, target)

print("input:{}\ntarget:{}\nSmooth L1 loss:{}".format(inputs, target, loss))

# input:tensor([[1., 1.],
#         [1., 1.]])
# target:tensor([[3., 3.],
#         [3., 3.]])
# Smooth L1 loss:tensor([[1.5000, 1.5000],
#         [1.5000, 1.5000]])

（4）

PoissonNLLLoss

nn.PoissonNLLLoss(log_input=True,
full=False,
size_average=None,
eps=1e-08,
reduce=None,
reduction='mean')

功能：计算泊松分布的负对数似然损失函数，用于target服从泊松分布的分类任务
计算公式：

{loss⁡=exp⁡(input)−target×inputlog_input=Trueloss⁡=exp⁡(input)−target×log⁡(input+eps)log_input=False \left\{ \begin{array}{lc} \operatorname{loss} = \exp(input) - target \times input & \qquad log\_input = True \\ \operatorname{loss} = \exp(input) - target \times \log (input+eps) & \qquad log\_input = False \\ \end{array}\right. {loss=exp(input)−target×inputloss=exp(input)−target×log(input+eps)log_input=Truelog_input=False

主要参数：
```
log_input
```
：输入是否为对数形式，决定计算公式
```
full
```
：计算所有loss，默认为
```
False
```
```
eps
```
：修正项，避免
```
log(input)
```
为
```
nan
```
实例：

inputs = torch.randn((2, 2))
target = torch.randn((2, 2))

loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
loss = loss_f(inputs, target)
print("input:{}\ntarget:{}\nPoisson NLL loss:{}".format(inputs, target, loss))

# --------------------------------- compute by hand

idx = 0
loss_1 = torch.exp(inputs[idx, idx]) - target[idx, idx]*inputs[idx, idx]
print("第一个元素loss:", loss_1)

# input:tensor([[ 0.7218,  0.0206],
#         [-0.2426,  0.7005]])
# target:tensor([[ 8.7604e-01,  1.3784e+00],
#         [-9.6065e-01,  2.9672e-04]])
# Poisson NLL loss:tensor([[1.4258, 0.9924],
#         [0.5514, 2.0146]])
# 第一个元素loss: tensor(1.4258)

（5）

nn.KLDivLoss

nn.KLDivLoss(size_average=None,
reduce=None,
reduction='mean')

功能：计算input和target之间的KLKLKL散度(Kullback–Leibler divergence)
计算公式：
l(x,y)=L:={l1,…,lN},ln=yn⋅(log⁡yn−xn) l(x, y)=L:=\left\{l_{1}, \ldots, l_{N}\right\}, \quad l_{n}=y_{n} \cdot\left(\log y_{n}-x_{n}\right) l(x,y)=L:={l1,…,lN},ln=yn⋅(logyn−xn)
注意事项：
函数要求输入服从概率分布，即在0到1之间，因此需提前将输入计算log-probabilities，可以通过
```
nn.logsoftmax()
```
实现
从上面的公式可以看出，并没有对xnx_nxn求log⁡\loglog，这是与交叉熵不同的地方
主要参数：
```
reduction
```
：计算模式，可为
```
none/sum/mean/batchsize
```
，
```
"none"
```
表示逐个元素计算，
```
"sum"
```
表示所有元素求和，返回标量，
```
"mean"
```
表示加权平均，
```
"batchsize"
```
返回标量batchsize维度求平均值
实例：

inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.3, 0.5]])
inputs_log = torch.log(inputs)
target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)

loss_f_none = nn.KLDivLoss(reduction='none')
loss_f_mean = nn.KLDivLoss(reduction='mean')
loss_f_bs_mean = nn.KLDivLoss(reduction='batchmean')

loss_none = loss_f_none(inputs, target)
loss_mean = loss_f_mean(inputs, target)
loss_bs_mean = loss_f_bs_mean(inputs, target)

print("loss_none:\n{}\nloss_mean:\n{}\nloss_bs_mean:\n{}".format(loss_none, loss_mean, loss_bs_mean))

# --------------------------------- compute by hand
idx = 0
loss_1 = target[idx, idx] * (torch.log(target[idx, idx]) - inputs[idx, idx])
print("第一个元素loss:", loss_1)

# loss_none:
# tensor([[-0.5448, -0.1648, -0.1598],
#         [-0.2503, -0.4597, -0.4219]])
# loss_mean:
# -0.3335360586643219
# loss_bs_mean:
# -1.000608205795288
# 第一个元素loss: tensor(-0.5448)

说明：从上面的代码可以看出

mean

和

batchmean

的区别，上面的代码中如果是

mean

，是除以6，而如果是

batchmean

则是除以2

（6）

nn.MarginRankingLoss

nn.MarginRankingLoss(margin=0.0,
size_average=None,
reduce=None,
reduction='mean')

功能：计算两个向量之间的相似度，当两个向量之间的距离大于
```
margin
```
，则 loss 为正，小于 margin，loss 为 0；用于排序任务该方法，计算两组数据之间的差异，返回一个$n\times n $的loss矩阵
计算公式与说明：
loss⁡(x,y)=max⁡(0,−y∗(x1−x2)+margin⁡) \operatorname{loss}(x, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin}) loss(x,y)=max(0,−y∗(x1−x2)+margin)
说明：yyy的取值为+1或-1，从上面的公式可以看出，当y=1y = 1y=1时，希望x1x1x1比x2x2x2大，当x1>x2x1>x2x1>x2时，不产生loss；而当y=−1y = -1y=−1时，希望x2x2x2比x1x1x1大，当x2>x1x2>x1x2>x1时，不产生loss
主要参数：
```
margin
```
：边界值，x1x1x1与x2x2x2之间的差异值
实例

x1 = torch.tensor([[1], [2], [3]], dtype=torch.float)
x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)

target = torch.tensor([1, 1, -1], dtype=torch.float)

loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')

loss = loss_f_none(x1, x2, target)

print(loss)
# tensor([[1., 1., 0.],
#         [0., 0., 0.],
#         [0., 0., 1.]])

（7）

nn.MultiLabelMarginLoss

nn.MultiLabelMarginLoss(size_average=None,
reduce=None,
reduction='mean')

功能：用于一个样本属于多个类别时的分类任务。例如一个四分类任务，样本xxx属于第 0 类，第 1 类，不属于第 2 类，第 3 类。
计算公式：
loss⁡(x,y)=∑ijmax⁡(0,1−(x[y[j]]−x[i]))x⋅size⁡(0) \operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-(x[y[j]]-x[i]))}{x \cdot \operatorname{size}(0)} loss(x,y)=ij∑x⋅size(0)max(0,1−(x[y[j]]−x[i]))
x[y[j]]x[y[j]]x[y[j]]表示样本xxx所属类的输出值，x[i]x[i]x[i]表示不等于该类的输出值。
参数：
```
reduction
```
：计算模式，可为
```
none/sum/mean
```
，
```
"none"
```
表示逐个元素计算，
```
"sum"
```
表示所有元素求和，返回标量，
```
"mean"
```
表示加权平均，返回标量
实例：

x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

loss_f = nn.MultiLabelMarginLoss(reduction='none')

loss = loss_f(x, y)

print(loss)

# tensor([0.8500])

# --------------------------------- compute by hand
# flag = 0
x = x[0]
item_1 = (1-(x[0] - x[1])) + (1 - (x[0] - x[2]))    # [0]
item_2 = (1-(x[3] - x[1])) + (1 - (x[3] - x[2]))    # [3]

loss_h = (item_1 + item_2) / x.shape[0]
print(loss_h)
# tensor(0.8500)

（8）

nn.SoftMarginLoss

nn.SoftMarginLoss(size_average=None,
reduce=None,
reduction='mean')

功能：计算二分类的logistic损失
计算公式：
loss⁡(x,y)=∑ilog⁡(1+exp⁡(−y[i]∗x[i])) x.nelement() \operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] * x[i]))}{\text { x.nelement()}} loss(x,y)=i∑ x.nelement()log(1+exp(−y[i]∗x[i]))
参数：
```
reduction
```
：计算模式，可为
```
none/sum/mean
```
，
```
"none"
```
表示逐个元素计算，
```
"sum"
```
表示所有元素求和，返回标量，
```
"mean"
```
表示加权平均，返回标量
实例：

inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)

loss_f = nn.SoftMarginLoss(reduction='none')

loss = loss_f(inputs, target)

print("SoftMargin: ", loss)
# SoftMargin:  tensor([[0.8544, 0.4032],
#         [0.4741, 0.9741]])

# --------------------------------- compute by hand
idx = 0
inputs_i = inputs[idx, idx]
target_i = target[idx, idx]
loss_h = np.log(1 + np.exp(-target_i * inputs_i))
print(loss_h)
# tensor(0.8544)

（9）

nn.MultiLabelSoftMarginLoss

nn.MultiLabelSoftMarginLoss(weight=None,
size_average=None,
reduce=None,
reduction='mean')

功能：
```
SoftMarginLoss
```
的多标签版本
计算公式：
loss⁡(x,y)=−1C∑i[A+B]其中：A=y[i]⋅log⁡((1+exp⁡(−x[i]))−1)B=(1−y[i])⋅log⁡(exp⁡(−x[i])(1+exp⁡(−x[i])))C为类别总数 \begin{aligned} & \qquad \qquad \operatorname{loss}(x, y)= - \frac{1}{C} \sum_i [A +B] \\ & \text{其中：} \\ & \qquad \qquad A = y[i] \cdot \log \left((1+\exp (-x[i]))^{-1}\right) \\ & \qquad \qquad B = (1-y[i]) \cdot\log \left(\frac{\exp (-x[i])}{(1+\exp (-x[i]))}\right) \\ & \qquad \qquad C \text{为类别总数} \end{aligned} loss(x,y)=−C1i∑[A+B]其中：A=y[i]⋅log((1+exp(−x[i]))−1)B=(1−y[i])⋅log((1+exp(−x[i]))exp(−x[i]))C为类别总数
主要参数：
```
weight
```
：为每个类别的loss设置权值。
```
weight
```
必须是
```
float
```
类型的
```
tensor
```
，其长度要于类别CCC一致，即每一个类别都要设置有
```
weight
```
。
实例：

inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)

loss_f = nn.MultiLabelSoftMarginLoss(reduction='none')
loss = loss_f(inputs, target)

print("MultiLabel SoftMargin: ", loss)
# MultiLabel SoftMargin:  tensor([0.5429])
# --------------------------------- compute by hand

i_0 = torch.log(torch.exp(-inputs[0, 0]) / (1 + torch.exp(-inputs[0, 0])))
i_1 = torch.log(1 / (1 + torch.exp(-inputs[0, 1])))
i_2 = torch.log(1 / (1 + torch.exp(-inputs[0, 2])))

loss_h = (i_0 + i_1 + i_2) / -3

print(loss_h)
# tensor(0.5429)

（10）

nn.MultiMarginLoss

nn.MultiMarginLoss(p=1,
margin=1.0,
weight=None,
size_average=None,
reduce=None,
reduction='mean')

功能：计算多分类的折页损失
计算公式：
loss⁡(x,y)=∑imax⁡(0,w[y]∗(margin⁡−x[y]+x[i]))p)x.size⁡(0) \operatorname{loss}(x, y)=\frac{\left.\sum_{i} \max (0, w[y] *(\operatorname{margin}-x[y]+x[i]))^{p}\right)}{\mathrm{x}.\operatorname{size}(0)} loss(x,y)=x.size(0)∑imax(0,w[y]∗(margin−x[y]+x[i]))p)
注意：$x \in{0, \cdots, \text { x.size }(0)-1}\ 且且且 \ y \in{0, \cdots, \text { y.size }(0)-1}$；同时0≤y[j]≤x.size⁡(0)−10 \leq y[j] \leq \mathrm{x} . \operatorname{size}(0)-10≤y[j]≤x.size(0)−1且i≠y[j]i \neq y[j]i=y[j] for all iii and jjj
主要参数：
```
p(int)
```
：默认值为 1，仅可选 1 或者 2
```
margin(float)
```
：默认值为 1
```
weight(Tensor)
```
：为每个类别的loss设置权值。
```
weight
```
必须是
```
float
```
类型的
```
tensor
```
，其长度要于类别CCC一致，即每一个类别都要设置有
```
weight
```
实例：

x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
y = torch.tensor([1, 2], dtype=torch.long)

loss_f = nn.MultiMarginLoss(reduction='none')

loss = loss_f(x, y)

print("Multi Margin Loss: ", loss)
# Multi Margin Loss:  tensor([0.8000, 0.7000])
# --------------------------------- compute by hand
x = x[0]
margin = 1

i_0 = margin - (x[1] - x[0])
# i_1 = margin - (x[1] - x[1])
i_2 = margin - (x[1] - x[2])

loss_h = (i_0 + i_2) / x.shape[0] # x.shape[0]=3

print(loss_h)
# tensor(0.8000)

（11）

nn.TripletMarginLoss

nn.TripletMarginLoss(margin=1.0,
p=2.0,
eps=1e-06,
swap=False,
size_average=None,
reduce=None,
reduction='mean')

功能：计算三元组损失，人脸验证中常用。如下图 Anchor、Negative、Positive，目标是让 Positive 元和 Anchor 元之间的距离尽可能的小，Positive 元和 Negative 元之间的距离尽可能的大
计算公式：
L(a,p,n)=max⁡{d(ai,pi)−d(ai,ni)+ margin, 0}其中d(xi,yi)=∥xi−yi∥pp \begin{aligned} & \qquad \qquad L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\text { margin, } 0\right\} \\ & \text{其中} \\ & \qquad \qquad d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\|_{p_{p}} \end{aligned} L(a,p,n)=max{d(ai,pi)−d(ai,ni)+ margin, 0}其中d(xi,yi)=∥xi−yi∥pp
主要参数：
```
margin(float)
```
：边界值，默认值为 1
```
p(int)
```
：范数的阶，默认为2
实例：

anchor = torch.tensor([[1.]])
pos = torch.tensor([[2.]])
neg = torch.tensor([[0.5]])

loss_f = nn.TripletMarginLoss(margin=1.0, p=1)
loss = loss_f(anchor, pos, neg)

print("Triplet Margin Loss", loss)
# Triplet Margin Loss tensor(1.5000)
# --------------------------------- compute by hand
margin = 1
a, p, n = anchor[0], pos[0], neg[0]

d_ap = torch.abs(a-p)
d_an = torch.abs(a-n)

loss = d_ap - d_an + margin

print(loss)
# tensor([1.5000])

（12）

nn.HingeEmbeddingLoss

nn.HingeEmbeddingLoss(margin=1.0,
size_average=None,
reduce=None,
reduction='mean’)

功能：计算两个输入的相似性，常用于非线性embedding和半监督学习
注意：输入xxx应为两个输入之差的绝对值
计算公式：
ln={xn, if yn=1max⁡{0,Δ−xn}, if yn=−1 l_{n}=\left\{\begin{array}{ll} x_{n}, & \text { if } y_{n}=1 \\ \max \left\{0, \Delta-x_{n}\right\}, & \text { if } y_{n}=-1 \end{array}\right. ln={xn,max{0,Δ−xn}, if yn=1 if yn=−1
主要参数：
```
margin
```
：边界值，默认值为1
实例：

inputs = torch.tensor([[1., 0.8, 0.5]])
target = torch.tensor([[1, 1, -1]])

loss_f = nn.HingeEmbeddingLoss(margin=1, reduction='none')
loss = loss_f(inputs, target)

print("Hinge Embedding Loss", loss)
# Hinge Embedding Loss tensor([[1.0000, 0.8000, 0.5000]])
# --------------------------------- compute by hand
margin = 1.
loss = max(0, margin - inputs.numpy()[0, 2])

print(loss)
# 0.5

（13）

nn.CosineEmbeddingLoss

nn.CosineEmbeddingLoss(margin=0.0,
size_average=None,
reduce=None,
reduction='mean')

功能：采用余弦相似度计算两个输入的相似性（使用余弦相似度说明不关注大小的相似性而是关注方向的相似性）
计算公式：
loss⁡(x,y)={1−cos⁡(x1,x2), if y=1max⁡(0,cos⁡(x1,x2)−margin⁡), if y=−1 \operatorname{loss}(x, y)=\left\{\begin{array}{ll} 1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \\ \max \left(0, \cos \left(x_{1}, x_{2}\right)-\operatorname{margin}\right), & \text { if } y=-1 \end{array}\right. loss(x,y)={1−cos(x1,x2),max(0,cos(x1,x2)−margin), if y=1 if y=−1
主要参数：
```
margin
```
：可取值[−1,1][-1, 1][−1,1]，推荐为[0,0.5][0, 0.5][0,0.5]
实例：

x1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
x2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])

target = torch.tensor([[1, -1]], dtype=torch.float)
loss_f = nn.CosineEmbeddingLoss(margin=0., reduction='none')

loss = loss_f(x1, x2, target)

print("Cosine Embedding Loss", loss)
# Cosine Embedding Loss tensor([[0.0167, 0.9833]])
# --------------------------------- compute by hand
margin = 0.

def cosine(a, b):
numerator = torch.dot(a, b)
denominator = torch.norm(a, 2) * torch.norm(b, 2)
return float(numerator/denominator)

l_1 = 1 - (cosine(x1[0], x2[0]))

l_2 = max(0, cosine(x1[0], x2[0]))

print(l_1, l_2)
# 0.016662120819091797 0.9833378791809082

（14）

nn.CTCLoss

nn.CTCLoss(blank=0,
reduction='mean',
zero_infinity=False)

功能：计算CTCCTCCTC损失，解决时序类数据的分类
主要参数：
```
blank
```
：blank label
```
zero_infinity
```
：无穷大的值或梯度置0
实例：

T = 50      # Input sequence length
C = 20      # Number of classes (including blank)
N = 16      # Batch size
S = 30      # Target sequence length of longest target in batch
S_min = 10  # Minimum target length, for demonstration purposes

# Initialize random batch of input vectors, for *size = (T,N,C)
inputs = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()

# Initialize random batch of targets (0 = blank, 1:C = classes)
target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)

input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)

ctc_loss = nn.CTCLoss()
loss = ctc_loss(inputs, target, input_lengths, target_lengths)

print("CTC loss: ", loss)

# CTC loss:  tensor(6.5103, grad_fn=<MeanBackward0>)

5、损失函数总结

PyTorch

提供了18种损失函数，即：

nn.CrossEntropyLoss
nn.NLLLoss
nn.BCELoss
nn.BCEWithLogitsLoss
nn.L1Loss
nn.MSELoss
nn.SmoothL1Loss
nn.PoissonNLLLoss
nn.KLDivLoss
nn.MarginRankingLoss
nn.MultiLabelMarginLoss
nn.SoftMarginLoss
nn.MultiLabelSoftMarginLoss
nn.MultiMarginLoss
nn.TripletMarginLoss
nn.HingeEmbeddingLoss
nn.CosineEmbeddingLoss
nn.CTCLoss

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

深度之眼Pytorch框架训练营第四期——损失函数

文章目录

损失函数

1、损失函数概念

（1）概述

（2）
PyTorch
中的Loss

2、交叉熵损失函数

3、NLL/BCE/BCEWithLogits Loss

（1）
nn.NLLLoss

（2）
nn.BCELoss

（3）
nn.BCEWithLogitsLoss

4、其余14种损失函数介绍

（1）
nn.L1Loss

（2）
nn.MSELoss

（3）
SmoothL1Loss

（4）
PoissonNLLLoss

（5）
nn.KLDivLoss

（6）
nn.MarginRankingLoss

（7）
nn.MultiLabelMarginLoss

（8）
nn.SoftMarginLoss

（9）
nn.MultiLabelSoftMarginLoss

（10）
nn.MultiMarginLoss

（11）
nn.TripletMarginLoss

（12）
nn.HingeEmbeddingLoss

（13）
nn.CosineEmbeddingLoss

（14）
nn.CTCLoss

5、损失函数总结

深度之眼Pytorch框架训练营第四期——损失函数

文章目录

损失函数

1、损失函数概念

（1）概述

（2）PyTorch中的Loss

2、交叉熵损失函数

3、NLL/BCE/BCEWithLogits Loss

（1）nn.NLLLoss

（2）nn.BCELoss

（3）nn.BCEWithLogitsLoss

4、其余14种损失函数介绍

（1）nn.L1Loss

（2）nn.MSELoss

（3）SmoothL1Loss

（4）PoissonNLLLoss

（5）nn.KLDivLoss

（6）nn.MarginRankingLoss

（7）nn.MultiLabelMarginLoss

（8）nn.SoftMarginLoss

（9）nn.MultiLabelSoftMarginLoss

（10）nn.MultiMarginLoss

（11）nn.TripletMarginLoss

（12）nn.HingeEmbeddingLoss

（13）nn.CosineEmbeddingLoss

（14）nn.CTCLoss

5、损失函数总结

（2）
PyTorch
中的Loss

（1）
nn.NLLLoss

（2）
nn.BCELoss

（3）
nn.BCEWithLogitsLoss

（1）
nn.L1Loss

（2）
nn.MSELoss

（3）
SmoothL1Loss

（4）
PoissonNLLLoss

（5）
nn.KLDivLoss

（6）
nn.MarginRankingLoss

（7）
nn.MultiLabelMarginLoss

（8）
nn.SoftMarginLoss

（9）
nn.MultiLabelSoftMarginLoss

（10）
nn.MultiMarginLoss

（11）
nn.TripletMarginLoss

（12）
nn.HingeEmbeddingLoss

（13）
nn.CosineEmbeddingLoss

（14）
nn.CTCLoss