您的位置:首页 > 其它

深度之眼Pytorch框架训练营第四期——损失函数

2020-06-04 07:55 591 查看

文章目录

  • 2、交叉熵损失函数
  • 3、NLL/BCE/BCEWithLogits Loss
  • 4、其余14种损失函数介绍
  • 5、损失函数总结
  • 损失函数

    1、损失函数概念

    (1)概述

    损失函数用于衡量模型输出真实标签的差异,需要注意区分损失函数,代价函数,目标函数三者的区别:

    • 损失函数:Loss=f(y^,y)Loss = f(\hat{y},y)Loss=f(y^​,y)
    • 代价函数:Cost=1N∑i=1Nf(y^i,yi)Cost = \frac{1}{N}\sum_{i=1}^{N}f(\hat{y}_i,y_i)Cost=N1​∑i=1N​f(y^​i​,yi​)
    • 目标函数:Obj=Cost+RegulationObj = Cost + RegulationObj=Cost+Regulation
    (2)
    PyTorch
    中的Loss
    class _Loss(Module):
    def __init__(self, size_average=None, reduce=None,reduction='mean'):
    super(_Loss, self).__init__()
    if size_average is not None or reduce is not None:
    self.reduction = _Reduction.legacy_get_string(size_average, reduce)
    else:
    self.reduction = reduction

    从上面的代码中可以看出,

    PyTorch
    _Loss
    类是继承的
    Module
    ,因此可以把
    Loss
    视为一个网络层,这里需要注意的是
    size_average
    reduce
    这两个参数之后会把舍弃,其功能完全由
    reduction
    参数替代

    2、交叉熵损失函数

    交叉熵等于信息熵加上相对熵:H(P,Q)=−∑i=1NP(xi)log⁡Q(xi)H(P,Q) = -\sum_{i=1}^{N}P(x_i)\log Q(x_i)H(P,Q)=−∑i=1N​P(xi​)logQ(xi​)

    PyTorch
    实现:
    nn.CrossEntropyLoss

    nn.CrossEntropyLoss(weight=None,
    size_average=None,
    ignore_index=-100,
    reduce=None,
    reduction='mean')
    • 功能:
      nn.LogSoftmax()
      nn.NLLLoss()
      结合,进行交叉熵计算
    • 主要参数:
    • weight
      :各类别的loss设置权值
    • ignore_index
      :忽略某个类别
    • reduction
      :计算模式,可为
      none/sum/mean
      "none"
      表示逐个元素计算,
      "sum"
      表示所有元素求和,返回标量,
      "mean"
      表示加权平均,返回标量
    • 计算公式:
      loss⁡(x, class )=−log⁡(exp⁡(x[ class ])∑jexp⁡(x[j]))=−x[ class ]+log⁡(∑jexp⁡(x[j])) \operatorname{loss}(x, \text { class })=-\log \left(\frac{\exp (x[\text { class }])}{\sum_{j} \exp (x[j])}\right)=-x[\text { class }]+\log \left(\sum_{j} \exp (x[j])\right) loss(x, class )=−log(∑j​exp(x[j])exp(x[ class ])​)=−x[ class ]+log(j∑​exp(x[j]))

    loss⁡(x, class )=weight⁡[ class ](−x[ class ]+log⁡(∑jexp⁡(x[j]))) \operatorname{loss}(x, \text { class })=\operatorname{weight}[\text { class }]\left(-x[\text { class }]+\log \left(\sum_{j} \exp (x[\mathrm{j}])\right)\right) loss(x, class )=weight[ class ](−x[ class ]+log(j∑​exp(x[j])))

    • 实例:
    # def loss function
    loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')
    
    # forward
    loss_none = loss_f_none(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)
    
    # view
    print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)
    
    # Cross Entropy Loss:
    #   tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)

    手动计算验证:

    idx = 0
    
    input_1 = inputs.detach().numpy()[idx]      # [1, 2]
    target_1 = target.numpy()[idx]              # [0]
    
    # 第一项
    x_class = input_1[target_1]
    
    # 第二项
    sigma_exp_x = np.sum(list(map(np.exp, input_1)))
    log_sigma_exp_x = np.log(sigma_exp_x)
    
    # 输出loss
    loss_1 = -x_class + log_sigma_exp_x
    
    print("第一个样本loss为: ", loss_1)
    
    # 第一个样本loss为:  1.3132617

    如果带

    weight
    参数:

    # def loss function
    weights = torch.tensor([1, 2], dtype=torch.float)
    # weights = torch.tensor([0.7, 0.3], dtype=torch.float)
    
    loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean')
    
    # forward
    loss_none_w = loss_f_none_w(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)
    
    # view
    print("\nweights: ", weights)
    print(loss_none_w, loss_sum, loss_mean)
    # weights:  tensor([1., 2.])
    # tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)

    这里注意:mean模式下是除以总权值份数,即1+2+2=5而非样本个数3

    • 小结:交叉熵损失函数多用于分类问题,共有三种模式可以选择

    3、NLL/BCE/BCEWithLogits Loss

    (1)
    nn.NLLLoss
    nn.NLLLoss(weight=None,
    size_average=None,
    ignore_index=-100,
    reduce=None,
    reduction='mean')
    • 功能:实现负对数似然函数中的负号功能(仅仅是取负号,不要被名称迷惑)
    • 计算公式:
      ℓ(x,y)=L={l1,…,lN}′,ln=−wynxn,yn \ell(x, y)=L=\left\{l_{1}, \dots, l_{N}\right\}^{\prime}, \quad l_{n}=-w_{y_{n}} x_{n, y_{n}} ℓ(x,y)=L={l1​,…,lN​}′,ln​=−wyn​​xn,yn​​
    • 主要参数:
    • weight
      :各类别的loss设置权值
    • ignore_index
      :忽略某个类别
    • reduction
      :计算模式,可为
      none/sum/mean
      "none"
      表示逐个元素计算,
      "sum"
      表示所有元素求和,返回标量,
      "mean"
      表示加权平均,返回标量
    • 实例:
    weights = torch.tensor([1, 1], dtype=torch.float)
    
    loss_f_none_w = nn.NLLLoss(weight=weights, reduction='none')
    loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')
    
    # forward
    loss_none_w = loss_f_none_w(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)
    
    # view
    print("\nweights: ", weights)
    print("NLL Loss", loss_none_w, loss_sum, loss_mean)
    
    # weights:  tensor([1., 1.])
    # NLL Loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

    这里的结果

    [-1,-3,-3]
    是这样计算的,由于第一个样本类别为0,因此
    [1,2]
    中取第一个元素并取负,因此为-1,第二个样本和第三个样本类别为1,因此
    [1,3]
    中取第二个元素并取负,因此为-3

    (2)
    nn.BCELoss
    nn.BCELoss(weight=None,
    size_average=None,
    reduce=None,
    reduction='mean')
    • 功能:二分类交叉熵
    • 计算公式:(注意输入值取值在[0,1][0,1][0,1])
      ln=−wn[yn⋅log⁡xn+(1−yn)⋅log⁡(1−xn)] l_{n}=-w_{n}\left[y_{n} \cdot \log x_{n}+\left(1-y_{n}\right) \cdot \log \left(1-x_{n}\right)\right] ln​=−wn​[yn​⋅logxn​+(1−yn​)⋅log(1−xn​)]
    • 主要参数:
    • weight
      :各类别的loss设置权值
    • ignore_index
      :忽略某个类别
    • reduction
      :计算模式,可为
      none/sum/mean
      "none"
      表示逐个元素计算,
      "sum"
      表示所有元素求和,返回标量,
      "mean"
      表示加权平均,返回标量
    • 实例:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    
    target_bce = target
    
    # itarget
    inputs = torch.sigmoid(inputs)
    
    weights = torch.tensor([1, 1], dtype=torch.float)
    
    loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')
    loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
    loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')
    
    # forward
    loss_none_w = loss_f_none_w(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)
    
    # view
    print("\nweights: ", weights)
    print("BCE Loss", loss_none_w, loss_sum, loss_mean)
    
    # weights:  tensor([1., 1.])
    # BCE Loss tensor([[0.3133, 2.1269],
    #         [0.1269, 2.1269],
    #         [3.0486, 0.0181],
    #         [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

    从上面的代码可以看出,一共得到了8个值,充分说明了逐个计算的含义,下面手动计算验证:

    idx = 0
    
    x_i = inputs.detach().numpy()[idx, idx]
    y_i = target.numpy()[idx, idx]              #
    
    # loss
    # l_i = -[ y_i * np.log(x_i) + (1-y_i) * np.log(1-y_i) ]      # np.log(0) = nan
    l_i = -y_i * np.log(x_i) if y_i else -(1-y_i) * np.log(1-x_i)
    
    # 输出loss
    print("BCE inputs: ", inputs)
    print("第一个loss为: ", l_i)
    
    # 第一个loss为:  0.31326166
    (3)
    nn.BCEWithLogitsLoss

    当不希望神经网络中出现SigmoidSigmoidSigmoid函数,而计算损失函数时需要SigmoidSigmoidSigmoid函数的时候,就可以使用

    nn.BCEWithLogitsLoss

    nn.BCEWithLogitsLoss(weight=None,
    size_average=None,
    reduce=None,
    reduction='mean',
    pos_weight=None)
    • 功能:结合
      Sigmoid
      与二分类交叉熵
    • 计算公式:
      ln=−wn[yn⋅log⁡σ(xn)+(1−yn)⋅log⁡(1−σ(xn))]σ就是Sigmoid函数 l_{n}=-w_{n}\left[y_{n} \cdot \log \sigma\left(x_{n}\right)+\left(1-y_{n}\right) \cdot \log \left(1-\sigma\left(x_{n}\right)\right)\right] \qquad \sigma\text{就是Sigmoid函数} ln​=−wn​[yn​⋅logσ(xn​)+(1−yn​)⋅log(1−σ(xn​))]σ就是Sigmoid函数
    • 主要参数:
    • pos_weight
      :正样本的权值,这个参数用于平衡样本权值,例如有100个正样本和300个负样本,则可把这个参数设置为3,这样的话正负样本就相互平衡了
    • weight
      :各类别的loss设置权值
    • ignore_index
      :忽略某个类别
    • reduction
      :计算模式,可为
      none/sum/mean
      "none"
      表示逐个元素计算,
      "sum"
      表示所有元素求和,返回标量,
      "mean"
      表示加权平均,返回标量
    • 注意事项:由于这里已经有SigmoidSigmoidSigmoid函数,所以网络最后不能再加
      Sigmoid
      函数
    • 实例:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    
    target_bce = target
    
    # inputs = torch.sigmoid(inputs)
    
    weights = torch.tensor([1, 1], dtype=torch.float)
    
    loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none')
    loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean')
    
    # forward
    loss_none_w = loss_f_none_w(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)
    
    # view
    print("\nweights: ", weights)
    print(loss_none_w, loss_sum, loss_mean)
    
    # weights:  tensor([1., 1.])
    # tensor([[0.3133, 2.1269],
    #         [0.1269, 2.1269],
    #         [3.0486, 0.0181],
    #         [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

    pos_weight
    参数:

    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    
    target_bce = target
    
    # itarget
    # inputs = torch.sigmoid(inputs)
    
    weights = torch.tensor([1], dtype=torch.float)
    pos_w = torch.tensor([3], dtype=torch.float)        # 3
    
    loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none', pos_weight=pos_w)
    loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum', pos_weight=pos_w)
    loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean', pos_weight=pos_w)
    
    # forward
    loss_none_w = loss_f_none_w(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)
    
    # view
    print("\npos_weights: ", pos_w)
    print(loss_none_w, loss_sum, loss_mean)
    
    # pos_weights:  tensor([3.])
    # tensor([[0.9398, 2.1269],
    #         [0.3808, 2.1269],
    #         [3.0486, 0.0544],
    #         [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)

    可以看出,第二段代码加上

    pos_weight=3
    后,输出结果中,为1的位置均乘上了3

    4、其余14种损失函数介绍

    (1)
    nn.L1Loss
    nn.L1Loss(size_average=None,
    reduce=None,
    reduction='mean’)
    • 功能:计算input和target之差的绝对值,可选返回同维度的张量或者是一个标量
    • 计算公式:
      ℓ(x,y)=L={l1,…,lN}⊤,ln=∣xn−yn∣ \ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=\left|x_{n}-y_{n}\right| ℓ(x,y)=L={l1​,…,lN​}⊤,ln​=∣xn​−yn​∣
    • 参数:
    • reduction
      :计算模式,可为
      none/sum/mean
      "none"
      表示逐个元素计算,
      "sum"
      表示所有元素求和,返回标量,
      "mean"
      表示加权平均,返回标量
    • 实例:
    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    
    loss_f = nn.L1Loss(reduction='none')
    loss = loss_f(inputs, target)
    
    print("input:{}\ntarget:{}\nL1 loss:{}".format(inputs, target, loss))
    
    # input:tensor([[1., 1.],
    #         [1., 1.]])
    # target:tensor([[3., 3.],
    #         [3., 3.]])
    # L1 loss:tensor([[2., 2.],
    #         [2., 2.]])
    (2)
    nn.MSELoss
    nn.MSELoss(size_average=None,
    reduce=None,
    reduction='mean’)
    • 功能:计算input和target之差的平方,可选返回同维度的张量或者是一个标量
    • 计算公式:
      ℓ(x,y)=L={l1,…,lN}⊤,ln=(xn−yn)2 \ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=\left(x_{n}-y_{n}\right)^2 ℓ(x,y)=L={l1​,…,lN​}⊤,ln​=(xn​−yn​)2
    • 参数:
    • reduction
      :计算模式,可为
      none/sum/mean
      "none"
      4000 表示逐个元素计算,
      "sum"
      表示所有元素求和,返回标量,
      "mean"
      表示加权平均,返回标量
    • 实例:
    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    
    loss_f = nn.MSELoss(reduction='none')
    loss = loss_f(inputs, target)
    
    print("input:{}\ntarget:{}\nMSE loss:{}".format(inputs, target, loss))
    
    # input:tensor([[1., 1.],
    #         [1., 1.]])
    # target:tensor([[3., 3.],
    #         [3., 3.]])
    # MSE loss:tensor([[4., 4.],
    #         [4., 4.]])
    (3)
    SmoothL1Loss
    nn.SmoothL1Loss(size_average=None,
    reduce=None,
    reduction='mean’)
    • 功能:计算平滑L1L1L1损失,属于Huber Loss中的一种(固定参数δ\deltaδ固定为1)
    • 计算公式与补充说明:
      计算平滑L1L1L1损失,属于Huber Loss中的一种,Huber Loss 常用于回归问题,其最大的特点是对离群点(outliers)、噪声不敏感,具有较强的鲁棒性,其计算公式为:
      Lδ(y,f(x))={12(y−f(x))2 for ∣y−f(x)∣≤δδ∣y−f(x)∣−12δ2 otherwise  L_{\delta}(y, f(x))=\left\{\begin{array}{ll} \frac{1}{2}(y-f(x))^{2} & \text { for }|y-f(x)| \leq \delta \\ \delta|y-f(x)|-\frac{1}{2} \delta^{2} & \text { otherwise } \end{array}\right. Lδ​(y,f(x))={21​(y−f(x))2δ∣y−f(x)∣−21​δ2​ for ∣y−f(x)∣≤δ otherwise ​
      这里的δ\deltaδ理解为误差控制率,即当误差绝对值小于δ\deltaδ,采用L2L2L2损失;若大于δ\deltaδ,采用L1L1L1损失。如果固定δ=1\delta=1δ=1,就是这里的
      SmoothL1Loss
      ,其计算公式就变为:
      loss⁡(x,y)=1n∑izi \operatorname{loss}(x, y)=\frac{1}{n} \sum_{i} z_{i} loss(x,y)=n1​i∑​zi​
      where ziz_{i}zi​ is given by:
      zi={0.5(xi−yi)2, if ∣xi−yi∣<1∣xi−yi∣−0.5, otherwise  z_{i}=\left\{\begin{array}{ll} 0.5\left(x_{i}-y_{i}\right)^{2}, & \text { if }\left|x_{i}-y_{i}\right|<1 \\ \left|x_{i}-y_{i}\right|-0.5, & \text { otherwise } \end{array}\right. zi​={0.5(xi​−yi​)2,∣xi​−yi​∣−0.5,​ if ∣xi​−yi​∣<1 otherwise ​
      SmoothL1Loss
      L1Loss
      的区别如下图所示:

    • 参数:
    • reduction
      :计算模式,可为
      none/sum/mean
      "none"
      表示逐个元素计算,
      "sum"
      表示所有元素求和,返回标量,
      "mean"
      表示加权平均,返回标量
    • 实例:
    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    
    loss_f = nn.SmoothL1Loss(reduction='none')
    loss = loss_f(inputs, target)
    
    print("input:{}\ntarget:{}\nSmooth L1 loss:{}".format(inputs, target, loss))
    
    # input:tensor([[1., 1.],
    #         [1., 1.]])
    # target:tensor([[3., 3.],
    #         [3., 3.]])
    # Smooth L1 loss:tensor([[1.5000, 1.5000],
    #         [1.5000, 1.5000]])
    (4)
    PoissonNLLLoss
    nn.PoissonNLLLoss(log_input=True,
    full=False,
    size_average=None,
    eps=1e-08,
    reduce=None,
    reduction='mean')
    • 功能:计算泊松分布的负对数似然损失函数,用于target服从泊松分布的分类任务
    • 计算公式:

    {loss⁡=exp⁡(input)−target×inputlog_input=Trueloss⁡=exp⁡(input)−target×log⁡(input+eps)log_input=False \left\{ \begin{array}{lc} \operatorname{loss} = \exp(input) - target \times input & \qquad log\_input = True \\ \operatorname{loss} = \exp(input) - target \times \log (input+eps) & \qquad log\_input = False \\ \end{array}\right. {loss=exp(input)−target×inputloss=exp(input)−target×log(input+eps)​log_input=Truelog_input=False​

    • 主要参数:

    • log_input
      :输入是否为对数形式,决定计算公式

    • full
      :计算所有loss,默认为
      False

    • eps
      :修正项,避免
      log(input)
      nan

    • 实例:

    inputs = torch.randn((2, 2))
    target = torch.randn((2, 2))
    
    loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
    loss = loss_f(inputs, target)
    print("input:{}\ntarget:{}\nPoisson NLL loss:{}".format(inputs, target, loss))
    
    # --------------------------------- compute by hand
    
    idx = 0
    loss_1 = torch.exp(inputs[idx, idx]) - target[idx, idx]*inputs[idx, idx]
    print("第一个元素loss:", loss_1)
    
    # input:tensor([[ 0.7218,  0.0206],
    #         [-0.2426,  0.7005]])
    # target:tensor([[ 8.7604e-01,  1.3784e+00],
    #         [-9.6065e-01,  2.9672e-04]])
    # Poisson NLL loss:tensor([[1.4258, 0.9924],
    #         [0.5514, 2.0146]])
    # 第一个元素loss: tensor(1.4258)
    (5)
    nn.KLDivLoss
    nn.KLDivLoss(size_average=None,
    reduce=None,
    reduction='mean')
    • 功能:计算input和target之间的KLKLKL散度(Kullback–Leibler divergence)
    • 计算公式:
      l(x,y)=L:={l1,…,lN},ln=yn⋅(log⁡yn−xn) l(x, y)=L:=\left\{l_{1}, \ldots, l_{N}\right\}, \quad l_{n}=y_{n} \cdot\left(\log y_{n}-x_{n}\right) l(x,y)=L:={l1​,…,lN​},ln​=yn​⋅(logyn​−xn​)
    • 注意事项:
    • 函数要求输入服从概率分布,即在0到1之间,因此需提前将输入计算log-probabilities,可以通过
      nn.logsoftmax()
      实现
    • 从上面的公式可以看出,并没有对xnx_nxn​求log⁡\loglog,这是与交叉熵不同的地方
    • 主要参数:
    • reduction
      :计算模式,可为
      none/sum/mean/batchsize
      "none"
      表示逐个元素计算,
      "sum"
      表示所有元素求和,返回标量,
      "mean"
      表示加权平均,
      "batchsize"
      返回标量batchsize维度求平均值
    • 实例:
    inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.3, 0.5]])
    inputs_log = torch.log(inputs)
    target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)
    
    loss_f_none = nn.KLDivLoss(reduction='none')
    loss_f_mean = nn.KLDivLoss(reduction='mean')
    loss_f_bs_mean = nn.KLDivLoss(reduction='batchmean')
    
    loss_none = loss_f_none(inputs, target)
    loss_mean = loss_f_mean(inputs, target)
    loss_bs_mean = loss_f_bs_mean(inputs, target)
    
    print("loss_none:\n{}\nloss_mean:\n{}\nloss_bs_mean:\n{}".format(loss_none, loss_mean, loss_bs_mean))
    
    # --------------------------------- compute by hand
    idx = 0
    loss_1 = target[idx, idx] * (torch.log(target[idx, idx]) - inputs[idx, idx])
    print("第一个元素loss:", loss_1)
    
    # loss_none:
    # tensor([[-0.5448, -0.1648, -0.1598],
    #         [-0.2503, -0.4597, -0.4219]])
    # loss_mean:
    # -0.3335360586643219
    # loss_bs_mean:
    # -1.000608205795288
    # 第一个元素loss: tensor(-0.5448)

    说明:从上面的代码可以看出

    mean
    batchmean
    的区别,上面的代码中如果是
    mean
    ,是除以6,而如果是
    batchmean
    则是除以2

    (6)
    nn.MarginRankingLoss
    nn.MarginRankingLoss(margin=0.0,
    size_average=None,
    reduce=None,
    reduction='mean')
    • 功能:计算两个向量之间的相似度,当两个向量之间的距离大于
      margin
      ,则 loss 为正,小于 margin,loss 为 0;用于排序任务该方法,计算两组数据之间的差异,返回一个$n\times n $的loss矩阵
    • 计算公式与说明:
      loss⁡(x,y)=max⁡(0,−y∗(x1−x2)+margin⁡) \operatorname{loss}(x, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin}) loss(x,y)=max(0,−y∗(x1−x2)+margin)
      说明:yyy的取值为+1或-1,从上面的公式可以看出,当y=1y = 1y=1时,希望x1x1x1比x2x2x2大,当x1>x2x1>x2x1>x2时,不产生loss;而当y=−1y = -1y=−1时,希望x2x2x2比x1x1x1大,当x2>x1x2>x1x2>x1时,不产生loss
    • 主要参数:
    • margin
      :边界值,x1x1x1与x2x2x2之间的差异值
    • 实例
    x1 = torch.tensor([[1], [2], [3]], dtype=torch.float)
    x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)
    
    target = torch.tensor([1, 1, -1], dtype=torch.float)
    
    loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')
    
    loss = loss_f_none(x1, x2, target)
    
    print(loss)
    # tensor([[1., 1., 0.],
    #         [0., 0., 0.],
    #         [0., 0., 1.]])
    (7)
    nn.MultiLabelMarginLoss
    nn.MultiLabelMarginLoss(size_average=None,
    reduce=None,
    reduction='mean')
    • 功能:用于一个样本属于多个类别时的分类任务。例如一个四分类任务,样本xxx属于第 0 类,第 1 类,不属于第 2 类,第 3 类。
    • 计算公式:
      loss⁡(x,y)=∑ijmax⁡(0,1−(x[y[j]]−x[i]))x⋅size⁡(0) \operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-(x[y[j]]-x[i]))}{x \cdot \operatorname{size}(0)} loss(x,y)=ij∑​x⋅size(0)max(0,1−(x[y[j]]−x[i]))​
      x[y[j]]x[y[j]]x[y[j]]表示样本xxx所属类的输出值,x[i]x[i]x[i]表示不等于该类的输出值。
    • 参数:
    • reduction
      :计算模式,可为
      none/sum/mean
      "none"
      表示逐个元素计算,
      "sum"
      表示所有元素求和,返回标量,
      "mean"
      表示加权平均,返回标量
    • 实例:
    x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
    y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)
    
    loss_f = nn.MultiLabelMarginLoss(reduction='none')
    
    loss = loss_f(x, y)
    
    print(loss)
    
    # tensor([0.8500])
    
    # --------------------------------- compute by hand
    # flag = 0
    x = x[0]
    item_1 = (1-(x[0] - x[1])) + (1 - (x[0] - x[2]))    # [0]
    item_2 = (1-(x[3] - x[1])) + (1 - (x[3] - x[2]))    # [3]
    
    loss_h = (item_1 + item_2) / x.shape[0]
    print(loss_h)
    # tensor(0.8500)
    (8)
    nn.SoftMarginLoss
    nn.SoftMarginLoss(size_average=None,
    reduce=None,
    reduction='mean')
    • 功能:计算二分类的logistic损失
    • 计算公式:
      loss⁡(x,y)=∑ilog⁡(1+exp⁡(−y[i]∗x[i])) x.nelement() \operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] * x[i]))}{\text { x.nelement()}} loss(x,y)=i∑​ x.nelement()log(1+exp(−y[i]∗x[i]))​
    • 参数:
    • reduction
      :计算模式,可为
      none/sum/mean
      "none"
      表示逐个元素计算,
      "sum"
      表示所有元素求和,返回标量,
      "mean"
      表示加权平均,返回标量
    • 实例:
    inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
    target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)
    
    loss_f = nn.SoftMarginLoss(reduction='none')
    
    loss = loss_f(inputs, target)
    
    print("SoftMargin: ", loss)
    # SoftMargin:  tensor([[0.8544, 0.4032],
    #         [0.4741, 0.9741]])
    
    # --------------------------------- compute by hand
    idx = 0
    inputs_i = inputs[idx, idx]
    target_i = target[idx, idx]
    loss_h = np.log(1 + np.exp(-target_i * inputs_i))
    print(loss_h)
    # tensor(0.8544)
    (9)
    nn.MultiLabelSoftMarginLoss
    nn.MultiLabelSoftMarginLoss(weight=None,
    size_average=None,
    reduce=None,
    reduction='mean')
    • 功能:
      SoftMarginLoss
      的多标签版本
    • 计算公式:
      loss⁡(x,y)=−1C∑i[A+B]其中:A=y[i]⋅log⁡((1+exp⁡(−x[i]))−1)B=(1−y[i])⋅log⁡(exp⁡(−x[i])(1+exp⁡(−x[i])))C为类别总数 \begin{aligned} & \qquad \qquad \operatorname{loss}(x, y)= - \frac{1}{C} \sum_i [A +B] \\ & \text{其中:} \\ & \qquad \qquad A = y[i] \cdot \log \left((1+\exp (-x[i]))^{-1}\right) \\ & \qquad \qquad B = (1-y[i]) \cdot\log \left(\frac{\exp (-x[i])}{(1+\exp (-x[i]))}\right) \\ & \qquad \qquad C \text{为类别总数} \end{aligned} ​loss(x,y)=−C1​i∑​[A+B]其中:A=y[i]⋅log((1+exp(−x[i]))−1)B=(1−y[i])⋅log((1+exp(−x[i]))exp(−x[i])​)C为类别总数​
    • 主要参数:
    • weight
      :为每个类别的loss设置权值。
      weight
      必须是
      float
      类型的
      tensor
      ,其长度要 于类别CCC一致,即每一个类别都要设置有
      weight
    • 实例:
    inputs = torch.tensor([[0.3, 0.7, 0.8]])
    target = torch.tensor([[0, 1, 1]], dtype=torch.float)
    
    loss_f = nn.MultiLabelSoftMarginLoss(reduction='none')
    loss = loss_f(inputs, target)
    
    print("MultiLabel SoftMargin: ", loss)
    # MultiLabel SoftMargin:  tensor([0.5429])
    # --------------------------------- compute by hand
    
    i_0 = torch.log(torch.exp(-inputs[0, 0]) / (1 + torch.exp(-inputs[0, 0])))
    i_1 = torch.log(1 / (1 + torch.exp(-inputs[0, 1])))
    i_2 = torch.log(1 / (1 + torch.exp(-inputs[0, 2])))
    
    loss_h = (i_0 + i_1 + i_2) / -3
    
    print(loss_h)
    # tensor(0.5429)
    (10)
    nn.MultiMarginLoss
    nn.MultiMarginLoss(p=1,
    margin=1.0,
    weight=None,
    size_average=None,
    reduce=None,
    reduction='mean')
    • 功能:计算多分类的折页损失
    • 计算公式:
      loss⁡(x,y)=∑imax⁡(0,w[y]∗(margin⁡−x[y]+x[i]))p)x.size⁡(0) \operatorname{loss}(x, y)=\frac{\left.\sum_{i} \max (0, w[y] *(\operatorname{margin}-x[y]+x[i]))^{p}\right)}{\mathrm{x}.\operatorname{size}(0)} loss(x,y)=x.size(0)∑i​max(0,w[y]∗(margin−x[y]+x[i]))p)​
      注意:$x \in{0, \cdots, \text { x.size }(0)-1}\ 且且且 \ y \in{0, \cdots, \text { y.size }(0)-1}$;同时0≤y[j]≤x.size⁡(0)−10 \leq y[j] \leq \mathrm{x} . \operatorname{size}(0)-10≤y[j]≤x.size(0)−1且i≠y[j]i \neq y[j]i​=y[j] for all iii and jjj
    • 主要参数:
    • p(int)
      :默认值为 1,仅可选 1 或者 2
    • margin(float)
      :默认值为 1
    • weight(Tensor)
      :为每个类别的loss设置权值。
      weight
      必须是
      float
      类型的
      tensor
      ,其长度要 于类别CCC一致,即每一个类别都要设置有
      weight
    • 实例:
    x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
    y = torch.tensor([1, 2], dtype=torch.long)
    
    loss_f = nn.MultiMarginLoss(reduction='none')
    
    loss = loss_f(x, y)
    
    print("Multi Margin Loss: ", loss)
    # Multi Margin Loss:  tensor([0.8000, 0.7000])
    # --------------------------------- compute by hand
    x = x[0]
    margin = 1
    
    i_0 = margin - (x[1] - x[0])
    # i_1 = margin - (x[1] - x[1])
    i_2 = margin - (x[1] - x[2])
    
    loss_h = (i_0 + i_2) / x.shape[0] # x.shape[0]=3
    
    print(loss_h)
    # tensor(0.8000)
    (11)
    nn.TripletMarginLoss
    nn.TripletMarginLoss(margin=1.0,
    p=2.0,
    eps=1e-06,
    swap=False,
    size_average=None,
    reduce=None,
    reduction='mean')
    • 功能:计算三元组损失,人脸验证中常用。如下图 Anchor、Negative、Positive,目标是让 Positive 元和 Anchor 元之间的距离尽可能的小,Positive 元和 Negative 元之间的距离尽可能的大

    • 计算公式:
      L(a,p,n)=max⁡{d(ai,pi)−d(ai,ni)+ margin, 0}其中d(xi,yi)=∥xi−yi∥pp \begin{aligned} & \qquad \qquad L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\text { margin, } 0\right\} \\ & \text{其中} \\ & \qquad \qquad d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\|_{p_{p}} \end{aligned} ​L(a,p,n)=max{d(ai​,pi​)−d(ai​,ni​)+ margin, 0}其中d(xi​,yi​)=∥xi​−yi​∥pp​​​

    • 主要参数:

    • margin(float)
      :边界值,默认值为 1

    • p(int)
      :范数的阶,默认为2

    • 实例:

    anchor = torch.tensor([[1.]])
    pos = torch.tensor([[2.]])
    neg = torch.tensor([[0.5]])
    
    loss_f = nn.TripletMarginLoss(margin=1.0, p=1)
    loss = loss_f(anchor, pos, neg)
    
    print("Triplet Margin Loss", loss)
    # Triplet Margin Loss tensor(1.5000)
    # --------------------------------- compute by hand
    margin = 1
    a, p, n = anchor[0], pos[0], neg[0]
    
    d_ap = torch.abs(a-p)
    d_an = torch.abs(a-n)
    
    loss = d_ap - d_an + margin
    
    print(loss)
    # tensor([1.5000])
    (12)
    nn.HingeEmbeddingLoss
    nn.HingeEmbeddingLoss(margin=1.0,
    size_average=None,
    reduce=None,
    reduction='mean’)
    • 功能:计算两个输入的相似性,常用于非线性embedding和半监督学习
    • 注意:输入xxx应为两个输入之差的绝对值
    • 计算公式:
      ln={xn, if yn=1max⁡{0,Δ−xn}, if yn=−1 l_{n}=\left\{\begin{array}{ll} x_{n}, & \text { if } y_{n}=1 \\ \max \left\{0, \Delta-x_{n}\right\}, & \text { if } y_{n}=-1 \end{array}\right. ln​={xn​,max{0,Δ−xn​},​ if yn​=1 if yn​=−1​
    • 主要参数:
    • margin
      :边界值,默认值为1
    • 实例:
    inputs = torch.tensor([[1., 0.8, 0.5]])
    target = torch.tensor([[1, 1, -1]])
    
    loss_f = nn.HingeEmbeddingLoss(margin=1, reduction='none')
    loss = loss_f(inputs, target)
    
    print("Hinge Embedding Loss", loss)
    # Hinge Embedding Loss tensor([[1.0000, 0.8000, 0.5000]])
    # --------------------------------- compute by hand
    margin = 1.
    loss = max(0, margin - inputs.numpy()[0, 2])
    
    print(loss)
    # 0.5
    (13)
    nn.CosineEmbeddingLoss
    nn.CosineEmbeddingLoss(margin=0.0,
    size_average=None,
    reduce=None,
    reduction='mean')
    • 功能:采用余弦相似度计算两个输入的相似性(使用余弦相似度说明不关注大小的相似性而是关注方向的相似性)
    • 计算公式:
      loss⁡(x,y)={1−cos⁡(x1,x2), if y=1max⁡(0,cos⁡(x1,x2)−margin⁡), if y=−1 \operatorname{loss}(x, y)=\left\{\begin{array}{ll} 1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \\ \max \left(0, \cos \left(x_{1}, x_{2}\right)-\operatorname{margin}\right), & \text { if } y=-1 \end{array}\right. loss(x,y)={1−cos(x1​,x2​),max(0,cos(x1​,x2​)−margin),​ if y=1 if y=−1​
    • 主要参数:
    • margin
      :可取值[−1,1][-1, 1][−1,1],推荐为[0,0.5][0, 0.5][0,0.5]
    • 实例:
    x1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
    x2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])
    
    target = torch.tensor([[1, -1]], dtype=torch.float)
    loss_f = nn.CosineEmbeddingLoss(margin=0., reduction='none')
    
    loss = loss_f(x1, x2, target)
    
    print("Cosine Embedding Loss", loss)
    # Cosine Embedding Loss tensor([[0.0167, 0.9833]])
    # --------------------------------- compute by hand
    margin = 0.
    
    def cosine(a, b):
    numerator = torch.dot(a, b)
    denominator = torch.norm(a, 2) * torch.norm(b, 2)
    return float(numerator/denominator)
    
    l_1 = 1 - (cosine(x1[0], x2[0]))
    
    l_2 = max(0, cosine(x1[0], x2[0]))
    
    print(l_1, l_2)
    # 0.016662120819091797 0.9833378791809082
    (14)
    nn.CTCLoss
    nn.CTCLoss(blank=0,
    reduction='mean',
    zero_infinity=False)
    • 功能:计算CTCCTCCTC损失,解决时序类数据的分类
    • 主要参数:
    • blank
      :blank label
    • zero_infinity
      :无穷大的值或梯度置0
    • 实例:
    T = 50      # Input sequence length
    C = 20      # Number of classes (including blank)
    N = 16      # Batch size
    S = 30      # Target sequence length of longest target in batch
    S_min = 10  # Minimum target length, for demonstration purposes
    
    # Initialize random batch of input vectors, for *size = (T,N,C)
    inputs = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
    
    # Initialize random batch of targets (0 = blank, 1:C = classes)
    target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)
    
    input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
    target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)
    
    ctc_loss = nn.CTCLoss()
    loss = ctc_loss(inputs, target, input_lengths, target_lengths)
    
    print("CTC loss: ", loss)
    
    # CTC loss:  tensor(6.5103, grad_fn=<MeanBackward0>)

    5、损失函数总结

    PyTorch
    提供了18种损失函数,即:

    • nn.CrossEntropyLoss
    • nn.NLLLoss
    • nn.BCELoss
    • nn.BCEWithLogitsLoss
    • nn.L1Loss
    • nn.MSELoss
    • nn.SmoothL1Loss
    • nn.PoissonNLLLoss
    • nn.KLDivLoss
    • nn.MarginRankingLoss
    • nn.MultiLabelMarginLoss
    • nn.SoftMarginLoss
    • nn.MultiLabelSoftMarginLoss
    • nn.MultiMarginLoss
    • nn.TripletMarginLoss
    • nn.HingeEmbeddingLoss
    • nn.CosineEmbeddingLoss
    • nn.CTCLoss
    内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
    标签: