您的位置:首页 > 理论基础 > 计算机网络

PyTorch | 从NumPy到PyTorch实现神经网络

2020-03-05 20:02 1146 查看

用NumPy实现两层神经网络

一个全连接ReLU神经网络,一个隐藏层,没有bias。用来从x预测y,使用Square Loss。

这一实现完全使用NumPy来计算前向神经网络,loss,和反向传播算法。

N—样本数据的大小 DinD_{in}Din​—输入层向量大小 H—隐藏层向量大小 DoutD_{out}Dout​—输出层向量大小

  • forward pass

    h=xw1h = xw_1h=xw1​                      x=>N∗Din       w1=>Din∗H       h=>N∗H\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x => N * D_{in}\ \ \ \ \ \ \ w_1 => D_{in} * H\ \ \ \ \ \ \ h => N * H                     x=>N∗Din​       w1​=>Din​∗H       h=>N∗H

  • hrelu=max(0,h)        hrelu=>N∗Hh_{relu} = max(0,h)\ \ \ \ \ \ \ \ h_{relu}=>N*Hhrelu​=max(0,h)        hrelu​=>N∗H

  • y^=hreluw2                 w2=>H∗Dout       y^=>N∗Dout\hat{y} = h_{relu}w_2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ w_2 => H * D_{out}\ \ \ \ \ \ \ \hat{y} => N * D_{out}y^​=hrelu​w2​                 w2​=>H∗Dout​       y^​=>N∗Dout​

  • loss
      L(ω)=(y^−y)2L(\omega) = (\hat{y} - y)^2L(ω)=(y^​−y)2
  • backward pass

      ∂L∂y^=2(y^−y)\frac{\partial{L}}{\partial{\hat{y}}} = 2(\hat{y} - y)∂y^​∂L​=2(y^​−y)

    • ∂L∂ω2=∂y^∂ω2∂L∂y^=hreluT∂L∂y^\frac{\partial{L}}{\partial{\omega_2}} = \frac{\partial{\hat{y}}}{\partial{\omega_2}}\frac{\partial{L}}{\partial{\hat{y}}} = h_{relu}^T\frac{\partial{L}}{\partial{\hat{y}}}∂ω2​∂L​=∂ω2​∂y^​​∂y^​∂L​=hreluT​∂y^​∂L​

    • ∂L∂hrelu=∂y^∂hrelu∂L∂y^=∂L∂y^ω2T\frac{\partial{L}}{\partial{h_{relu}}} = \frac{\partial{\hat{y}}}{\partial{h_{relu}}}\frac{\partial{L}}{\partial{\hat{y}}} = \frac{\partial{L}}{\partial{\hat{y}}}\omega_2^T∂hrelu​∂L​=∂hrelu​∂y^​​∂y^​∂L​=∂y^​∂L​ω2T​

    • ∂L∂h=∂L∂hrelu        if  h<0   ∂L∂h=0\frac{\partial{L}}{\partial{h}} = \frac{\partial{L}}{\partial{h_{relu}}}\ \ \ \ \ \ \ \ if \ \ h<0\ \ \ \frac{\partial{L}}{\partial{h}} = 0∂h∂L​=∂hrelu​∂L​        if  h<0   ∂h∂L​=0

    • ∂L∂ω1=∂h∂ω1∂L∂h\frac{\partial{L}}{\partial{\omega_1}} = \frac{\partial{h}}{\partial{\omega_1}}\frac{\partial{L}}{\partial h}∂ω1​∂L​=∂ω1​∂h​∂h∂L​

    NumPy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识,也不知道计算图(computation graph),只是一种用来计算数学运算的数据结构。

    import numpy as np
    import matplotlib.pyplot as plt
    # N—样本数据大小 D_in—输入层向量大小 H—隐藏层向量大小 D_out—输出层向量大小
    N, D_in, H, D_out = 64, 1000, 100, 10
    
    # 随机创建一些训练数据
    x = np.random.randn(N, D_in)
    y = np.random.randn(N, D_out)
    
    # 初始化权重w1 w2参数
    
    w1 = np.random.randn(D_in, H)
    w2 = np.random.randn(H, D_out)
    
    # 初始化学习率
    learning_rate = 1e-6
    
    Loss = []
    for it in range(500):
    # Forward pass
    h = x.dot(w1) # N * H
    h_relu = np.maximum(h, 0) # N * H
    y_pred = h_relu.dot(w2) # N * D_out
    
    # compute loss
    loss = np.square(y_pred - y).sum()
    Loss.append(loss)
    if it%49 == 0:
    print(it, loss)
    
    # Backword paaa
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2
    
    fig = plt.figure()
    ax = plt.subplot(111)
    ax.plot(Loss,lw=2)
    plt.savefig("result.png")
    plt.show()

    Output

    0 25842131.18766065
    49 9341.26313812292
    98 189.5463364244673
    147 6.111110996230143
    196 0.23069281196938063
    245 0.009375792322499452
    294 0.000398055380964109
    343 1.7414106449815095e-05
    392 7.791508421066646e-07
    441 3.5475745256511925e-08
    490 1.6372834073046531e-09

    PyTorch:Tensor和autograd

    PyTorch的一个重要功能就是autograd,也就是说只要定义了forward pass(前向神经网络),计算了loss之后,PyTorch可以自动求导计算模型所有参数的梯度。

    一个PyTorch的Tensor表示计算图中的一个节点。如果x是一个Tensor并且x.requires_grad = True,那么x.grad是另一个存储着x当前梯度(相当于一个scalar,常常是loss)的向量

    import torch
    import matplotlib.pyplot as plt
    # N—样本数据大小 D_in—输入层向量大小 H—隐藏层向量大小 D_out—输出层向量大小
    N, D_in, H, D_out = 64, 1000, 100, 10
    
    # 随机创建一些训练数据
    x = torch.randn(N, D_in)
    y = torch.randn(N, D_out)
    
    # 初始化权重w1 w2参数
    
    w1 = torch.randn(D_in, H, requires_grad = True)
    w2 = torch.randn(H, D_out, requires_grad = True)
    
    # 初始化学习率
    learning_rate = 1e-6
    
    Loss = []
    for it in range(500):
    # Forward pass
    y_pred = x.mm(w1).clamp(min = 0).mm(w2)
    # compute loss
    loss = (y_pred - y).pow(2).sum() # computation graph
    Loss.append(loss.item())
    if it%50 == 49:
    print(it, loss.item())
    
    # Backword pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad
    w1.grad.zero_()
    w2.grad.zero_()
    fig = plt.figure()
    ax = plt.subplot(111)
    ax.plot(Loss,lw=2)
    plt.savefig("result.png")
    plt.show()

    Output

    49 19870.130859375
    99 1072.4837646484375
    149 85.9421157836914
    199 7.828434467315674
    249 0.753280520439148
    299 0.0745907872915268
    349 0.007736038416624069
    399 0.0010647654999047518
    449 0.00025447571533732116
    499 9.513569966657087e-05

    PyTorch:nn

    这次我们使用PyTorch中nn这个库来构建网络。用PyTorch autograd来构建计算图和计算gradient,然后PyTorch会帮我们自动计算gradient。

    import torch
    import torch.nn as nn
    import matplotlib.pyplot as plt
    # N—样本数据大小 D_in—输入层向量大小 H—隐藏层向量大小 D_out—输出层向量大小
    N, D_in, H, D_out = 64, 1000, 100, 10
    
    # 随机创建一些训练数据
    x = torch.randn(N, D_in)
    y = torch.randn(N, D_out)
    
    model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out)
    )
    
    loss_fn = nn.MSELoss(reduction = 'sum')
    
    # 初始化学习率
    learning_rate = 1e-3
    
    Loss = []
    for it in range(500):
    # Forward pass
    y_pred = model(x)
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    Loss.append(loss.item())
    if it%50 == 49:
    print(it, loss.item())
    
    model.zero_grad()
    
    # Backword pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
    for param in model.parameters():
    param -= learning_rate * param.grad
    fig = plt.figure()
    ax = plt.subplot(111)
    ax.plot(Loss,lw=2)
    plt.savefig("result.png")
    plt.show()

    Output

    49 0.003269762033596635
    99 1.1983887588939979e-06
    149 5.244479295285487e-10
    199 1.957820407530453e-12
    249 1.967756903600848e-12
    299 1.74486575361954e-12
    349 1.9000298296517615e-12
    399 1.9209714114537535e-12
    449 2.1060768146813347e-12
    499 2.0324950490008264e-12

    Pytorch:optim

    这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。optim这个package提供了各种不同的模型优化方法,包括SGD+momentum,RMSProp,Adam等等。

    import torch
    import torch.nn as nn
    import matplotlib.pyplot as plt
    
    # N—样本数据大小 D_in—输入层向量大小 H—隐藏层向量大小 D_out—输出层向量大小
    N, D_in, H, D_out = 64, 1000, 100, 10
    
    # 随机创建一些训练数据
    x = torch.randn(N, D_in)
    y = torch.randn(N, D_out)
    
    model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out)
    )
    
    # 初始化学习率
    learning_rate = 1e-3
    
    loss_fn = nn.MSELoss(reduction = 'sum')
    optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
    
    Loss = []
    for it in range(500):
    # Forward pass
    y_pred = model(x)
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    Loss.append(loss.item())
    if it%50 == 49:
    print(it, loss.item())
    
    optimizer.zero_grad()
    # Backword pass
    loss.backward()
    
    # update weights of w1 and w2
    optimizer.step()
    fig = plt.figure()
    ax = plt.subplot(111)
    ax.plot(Loss,lw=2)
    plt.savefig("result.png")
    plt.show()

    Output

    49 0.8909258842468262
    99 0.005603241268545389
    149 3.349817779962905e-05
    199 1.8448776017976343e-07
    249 9.901702791026423e-10
    299 1.6195045304812083e-11
    349 8.347061757063567e-12
    399 7.631120561846227e-12
    449 1.0616650787664828e-11
    499 9.00871651582369e-12

    PyTorch:自定义 nn Module

    我们可以定义一个模型,这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型,就需要定义nn.Module模型。

    import torch
    import torch.nn as nn
    import matplotlib.pyplot as plt
    
    # N—样本数据大小 D_in—输入层向量大小 H—隐藏层向量大小 D_out—输出层向量大小
    N, D_in, H, D_out = 64, 1000, 100, 10
    
    # 随机创建一些训练数据
    x = torch.randn(N, D_in)
    y = torch.randn(N, D_out)
    
    class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
    super(TwoLayerNet, self).__init__()
    self.linear1 = torch.nn.Linear(D_in, H, bias=False)
    self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
    y_pred = self.linear2(self.linear1(x).clamp(min=0))
    return y_pred
    
    model = TwoLayerNet(D_in, H, D_out)
    loss_fn = nn.MSELoss(reduction = 'sum')
    learning_rate = 1e-3
    optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
    
    Loss = []
    for it in range(500):
    # Forward pass
    y_pred = model(x)
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    Loss.append(loss.item())
    if it%50 == 49:
    print(it, loss.item())
    
    optimizer.zero_grad()
    # Backword pass
    loss.backward()
    
    # update weights of w1 and w2
    optimizer.step()
    fig = plt.figure()
    ax = plt.subplot(111)
    ax.plot(Loss,lw=2)
    plt.savefig("result.png")
    plt.show()

    Output

    49 1.059552788734436
    99 0.005720105022192001
    149 3.2995118090184405e-05
    199 1.9062906631006626e-07
    249 1.0079914680716229e-09
    299 1.5386018847873828e-11
    349 8.505867017671864e-12
    399 9.710801607276665e-12
    449 9.59173278997083e-12
    499 1.120120142472647e-11

    • 点赞
    • 收藏
    • 分享
    • 文章举报
    Oh_MyBug 发布了30 篇原创文章 · 获赞 3 · 访问量 5922 私信 关注
  • 内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
    标签: