您的位置:首页 > 其它

百度基于Paddle深度学习基础第二周实践作业完成过程

spring_poll 2020-08-22 23:25 239 查看 https://blog.51cto.com/151787/

百度基于Paddle深度学习基础第二周实践作业完成过程

1. 作业要求

本代码旨在于使用ResNet进行眼睑筛查,代码已经完成,可以直接运行。

题目要求:

  • 通过查阅API,使用衰减学习率,通过多次调参数,找到一个最佳的衰减步长,使得loss比原代码中下降的更快
  • 请自行绘制修改学习率前后的loss衰减图

注意:

  • 原代码中仅需要更改学习率部分
  • 若loss下降效果不明显,可自行调大epoch_num至10

2. 总体说明:

程序做了如下修改,并在程序中做了对应的标注:

1) 修改了 epoch_num 为 10,增加运行轮数

2) 定义学习率变量,使用 paddle API,使学习自动衰减

3) 定义 losses 集合与 iters 集合,方便绘制 loss 图形

4) 定义了 绘制 loss 图的函数,方便调用与图形绘制

2.1 添加的代码如下:

# 4. 绘制 loss 变化情况
#   【参考: 课本2,项目8: 可视化分析】

# 4.1. 引入绘图库
import matplotlib.pyplot as plt
# 在 jupyter 中能绘制图形
%matplotlib inline

# 4.2. 定义绘制 loss 变化曲线的函数
'''
@param
iters:  横坐标
losses_train: 训练losses
'''
def plot_change_loss(iters, losses_train):
#画出训练过程中Loss的变化曲线
plt.figure()
plt.title("train loss", fontsize=24)
plt.xlabel("iter", fontsize=14)
plt.ylabel("loss", fontsize=14)
plt.plot(iters, losses_train,color='red',label='train loss')
plt.grid()

plt.show()

2.2 修改训练过程代码如下:

# 定义训练过程
def train(model):
with fluid.dygraph.guard():
print('start training ... ')
model.train()
epoch_num = 10    # 1.修改这里的epoch_num 由1修改为10
# 2.1. 定义学习率,并加载优化器参数到模型中,【参考 课节9,项目9. 模型加载及恢复训练-->可视化分析】
#  眼疾数据,包含1200个受试者的眼底视网膜图片,训练、验证和测试数据集各400张
BATCH_SIZE = 10
# total_steps = (int(400//BATCH_SIZE) + 1) * epoch_num
total_steps = (int(400//BATCH_SIZE) + 1) * 2.5
lr = fluid.dygraph.PolynomialDecay(0.005, total_steps, 0.001)
# 3.1. 添加迭代记数器及迭代记数列表、loss列表 以方便绘制图形
iter_count = 0
iters = []
losses_train = []   # 训练的loss

# 定义优化器
# opt = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9, parameter_list=model.parameters())
# 2.2. 将固定学习率,修改为动态学习率
opt = fluid.optimizer.Momentum(learning_rate=lr, momentum=0.9, parameter_list=model.parameters())
# 定义数据读取器,训练数据读取器和验证数据读取器
train_loader = data_loader(DATADIR, batch_size=10, mode='train')
valid_loader = valid_data_loader(DATADIR2, CSVFILE)
for epoch in range(epoch_num):
for batch_id, data in enumerate(train_loader()):
x_data, y_data = data
img = fluid.dygraph.to_variable(x_data)
label = fluid.dygraph.to_variable(y_data)
# 运行模型前向计算,得到预测值
logits = model(img)
# 进行loss计算
loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
avg_loss = fluid.layers.mean(loss)

if batch_id % 10 == 0:
print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))

# 3.2. 添加迭代次数记数并填充集合
iters.append(iter_count)
losses_train.append(avg_loss.numpy())
iter_count += 10

# 反向传播,更新权重,清除梯度
avg_loss.backward()
opt.minimize(avg_loss)
model.clear_gradients()

model.eval()
accuracies = []
losses = []
for batch_id, data in enumerate(valid_loader()):
x_data, y_data = data
img = fluid.dygraph.to_variable(x_data)
label = fluid.dygraph.to_variable(y_data)
# 运行模型前向计算,得到预测值
logits = model(img)
# 二分类,sigmoid计算后的结果以0.5为阈值分两个类别
# 计算sigmoid后的预测概率,进行loss计算
pred = fluid.layers.sigmoid(logits)
loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
# 计算预测概率小于0.5的类别
pred2 = pred * (-1.0) + 1.0
# 得到两个类别的预测概率,并沿第一个维度级联
pred = fluid.layers.concat([pred2, pred], axis=1)
acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64'))
accuracies.append(acc.numpy())
losses.append(loss.numpy())

print("[validation] accuracy/loss: {}/{}".format(np.mean(accuracies), np.mean(losses)))
model.train()

# save params of model
fluid.save_dygraph(model.state_dict(), 'palm')
# save optimizer state
fluid.save_dygraph(opt.state_dict(), 'palm')

# 4.3. 调用绘制 loss 变化曲线函数进行绘制
plot_change_loss(iters, losses_train)

3. 小结

1) 通过测试,发现每次的运行过程特别的漫长,稍微修改一点内容,就要等待很长的时间,感觉算力真的很重要。

2) 查 paddle API 和查以前的课程内容相结合,很多相关的内容和设计思路,老师在前面的课程中已经介绍了,或在前面的课程的程序中已经做了演示,如果单独看 API 的例子,不确定怎么写程序的话,可以参考老师在前面课程中已经写完类似的代码去实现自己的目标。

3) 因为随机批次读取数据,所以相同的参数不同的运行,也会得到不同的运行效果。

4. 下一节各个测试用例的运行效果的说明:

完成程序的修改后,做了如下的运行测试,每次运行程序大概 45 分钟,有些记录了详细的 过程,有些只记录了最终的 loss 图形

1) (其中测试1)是在固定学习率0.001(也就是老师程序的学习率)下,运行了程序,发现准确率能达到 93%;

2) (测试2-6)是修改了不同的学习率的衰减范围,运行了程序,准确率最高能达到 95-96%;

3) (测试7之后)是修改学习率的衰减的步数(思路是先使用较大的步数,然后使用较小的步数,查看它们的效果,然后不断使用2分法,向中间查找对应频数的运行效果), 修改参数运行了程序后,发现(8 的测试步数)loss下降较快,因此采用的步数为 total_steps = (int(400//BATCH_SIZE) + 1) * 2.5,学习率从0.005 到 0.001, 准确率最高能达到:95-96%,

5. 测试过程

5.1 测试1. 未修改学习率时, 运行10轮的效果如下:

运行时长: 44分44秒478毫秒

结束时间: 2020-08-22 08:54:13

start training ...

epoch: 0, batch_id: 0, loss is: [0.61399436]

epoch: 0, batch_id: 10, loss is: [0.5397954]

epoch: 0, batch_id: 20, loss is: [0.6404487]

epoch: 0, batch_id: 30, loss is: [0.6857702]

[validation] accuracy/loss: 0.7849999666213989/0.48529601097106934

epoch: 1, batch_id: 0, loss is: [0.6386715]

epoch: 1, batch_id: 10, loss is: [0.4865201]

epoch: 1, batch_id: 20, loss is: [0.50081044]

epoch: 1, batch_id: 30, loss is: [0.34162915]

[validation] accuracy/loss: 0.7024999856948853/0.6229860186576843

epoch: 2, batch_id: 0, loss is: [0.25662675]

epoch: 2, batch_id: 10, loss is: [1.7949547]

epoch: 2, batch_id: 20, loss is: [0.19568667]

epoch: 2, batch_id: 30, loss is: [0.19662617]

[validation] accuracy/loss: 0.9149999618530273/0.24236293137073517

epoch: 3, batch_id: 0, loss is: [0.84582233]

epoch: 3, batch_id: 10, loss is: [0.12374055]

epoch: 3, batch_id: 20, loss is: [0.39764705]

epoch: 3, batch_id: 30, loss is: [0.2201365]

[validation] accuracy/loss: 0.862500011920929/0.3080523610115051

epoch: 4, batch_id: 0, loss is: [0.11742544]

epoch: 4, batch_id: 10, loss is: [0.33280876]

epoch: 4, batch_id: 20, loss is: [0.13732623]

epoch: 4, batch_id: 30, loss is: [1.1103892]

[validation] accuracy/loss: 0.8575000762939453/0.3615248501300812

epoch: 5, batch_id: 0, loss is: [0.13193114]

epoch: 5, batch_id: 10, loss is: [0.5138872]

epoch: 5, batch_id: 20, loss is: [0.3979571]

epoch: 5, batch_id: 30, loss is: [0.42524424]

[validation] accuracy/loss: 0.9350000619888306/0.18516869843006134

epoch: 6, batch_id: 0, loss is: [0.21446273]

epoch: 6, batch_id: 10, loss is: [0.29208523]

epoch: 6, batch_id: 20, loss is: [0.71075696]

epoch: 6, batch_id: 30, loss is: [0.16396093]

[validation] accuracy/loss: 0.9299999475479126/0.2387014776468277

epoch: 7, batch_id: 0, loss is: [0.31918693]

epoch: 7, batch_id: 10, loss is: [0.05028909]

epoch: 7, batch_id: 20, loss is: [0.16989382]

epoch: 7, batch_id: 30, loss is: [0.13365436]

[validation] accuracy/loss: 0.9350000619888306/0.2305712103843689

epoch: 8, batch_id: 0, loss is: [0.13404241]

epoch: 8, batch_id: 10, loss is: [0.14186636]

epoch: 8, batch_id: 20, loss is: [0.03710499]

epoch: 8, batch_id: 30, loss is: [0.12268938]

[validation] accuracy/loss: 0.9325000047683716/0.19645212590694427

epoch: 9, batch_id: 0, loss is: [0.23441052]

epoch: 9, batch_id: 10, loss is: [0.07912876]

epoch: 9, batch_id: 20, loss is: [0.29366916]

epoch: 9, batch_id: 30, loss is: [0.34971437]

[validation] accuracy/loss: 0.9325000643730164/0.19805531203746796

5.2 测试2. 动态衰减学习率0.01-->0.001时的运行2轮的效果如下:

运行2轮的情况:

运行时长: 8分58秒267毫秒

结束时间: 2020-08-21 23:48:37

start training ...

epoch: 0, batch_id: 0, loss is: [0.84265745]

epoch: 0, batch_id: 10, loss is: [1.1629268]

epoch: 0, batch_id: 20, loss is: [0.529528]

epoch: 0, batch_id: 30, loss is: [0.69051445]

[validation] accuracy/loss: 0.7024999856948853/2.064648389816284

epoch: 1, batch_id: 0, loss is: [0.16901067]

epoch: 1, batch_id: 10, loss is: [0.32736033]

epoch: 1, batch_id: 20, loss is: [0.3370896]

epoch: 1, batch_id: 30, loss is: [0.01843431]

[validation] accuracy/loss: 0.8575000762939453/0.371820330619812

5.3 测试3. 动态衰减学习率0.01-->0.001,运行10轮的效果如下:

运行时长: 44分30秒125毫秒

结束时间: 2020-08-22 00:35:34

start training ...

epoch: 0, batch_id: 0, loss is: [0.7350406]

epoch: 0, batch_id: 10, loss is: [1.5526597]

epoch: 0, batch_id: 20, loss is: [1.4809185]

epoch: 0, batch_id: 30, loss is: [1.2971458]

[validation] accuracy/loss: 0.4749999940395355/3.1085333824157715

epoch: 1, batch_id: 0, loss is: [1.6478646]

epoch: 1, batch_id: 10, loss is: [0.87493515]

epoch: 1, batch_id: 20, loss is: [0.433599]

epoch: 1, batch_id: 30, loss is: [0.1780614]

[validation] accuracy/loss: 0.7949999570846558/0.5167998671531677

epoch: 2, batch_id: 0, loss is: [0.08465642]

epoch: 2, batch_id: 10, loss is: [0.7266342]

epoch: 2, batch_id: 20, loss is: [0.05047063]

epoch: 2, batch_id: 30, loss is: [0.09345372]

[validation] accuracy/loss: 0.887499988079071/0.6409258842468262

epoch: 3, batch_id: 0, loss is: [0.27393234]

epoch: 3, batch_id: 10, loss is: [2.053777]

epoch: 3, batch_id: 20, loss is: [0.07622384]

epoch: 3, batch_id: 30, loss is: [0.28289387]

[validation] accuracy/loss: 0.9050000309944153/0.33014366030693054

epoch: 4, batch_id: 0, loss is: [0.8107643]

epoch: 4, batch_id: 10, loss is: [4.6561575]

epoch: 4, batch_id: 20, loss is: [0.4564219]

epoch: 4, batch_id: 30, loss is: [0.35920316]

[validation] accuracy/loss: 0.9149999618530273/0.20150217413902283

epoch: 5, batch_id: 0, loss is: [0.5040611]

epoch: 5, batch_id: 10, loss is: [0.24063113]

epoch: 5, batch_id: 20, loss is: [0.5784434]

epoch: 5, batch_id: 30, loss is: [0.11537111]

[validation] accuracy/loss: 0.8899999856948853/0.3055678606033325

epoch: 6, batch_id: 0, loss is: [0.32148445]

epoch: 6, batch_id: 10, loss is: [0.10687976]

epoch: 6, batch_id: 20, loss is: [0.01122489]

epoch: 6, batch_id: 30, loss is: [0.69238436]

[validation] accuracy/loss: 0.9424999952316284/0.2373284101486206

epoch: 7, batch_id: 0, loss is: [0.22534744]

epoch: 7, batch_id: 10, loss is: [0.21905744]

epoch: 7, batch_id: 20, loss is: [0.24537715]

epoch: 7, batch_id: 30, loss is: [0.02578714]

[validation] accuracy/loss: 0.9624999761581421/0.18944025039672852

epoch: 8, batch_id: 0, loss is: [0.06122933]

epoch: 8, batch_id: 10, loss is: [1.0541042]

epoch: 8, batch_id: 20, loss is: [0.66766155]

epoch: 8, batch_id: 30, loss is: [0.03150726]

[validation] accuracy/loss: 0.9475000500679016/0.16870255768299103

epoch: 9, batch_id: 0, loss is: [0.03649266]

epoch: 9, batch_id: 10, loss is: [0.10476176]

epoch: 9, batch_id: 20, loss is: [0.10077281]

epoch: 9, batch_id: 30, loss is: [0.16713764]

[validation] accuracy/loss: 0.9524999856948853/0.1519453078508377

5.3.1 测试3.1. 动态衰减学习率0.05-->0.001,运行10轮的效果如下:

5.4 测试4. 动态衰减学习率0.1-->0.001,运行10轮的效果如下:

5.5 测试5. 动态衰减学习率0.005-->0.001,运行10轮的效果如下:

5.6 测试6. 动态衰减学习率0.005-->0.0005,运行10轮的效果如下:

但是运行很不稳定,忽高忽低

上面的学习率衰减步数为:

total_steps = (int(400//BATCH_SIZE) + 1) * epoch_num

基于 BATCH_SIZE=10, epoch_num = 10

,下面修改衰减步数:

5.7 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) ,运行10轮的效果如下:

5.8 测试8. 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) *2.5,运行10轮的效果如下:


这个的效果比较好

再次运行时的效果如下:

5.9 测试9. 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) *5,运行10轮的效果如下:

5.10 测试10. 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) *3.5,运行10轮的效果如下:


验证集的准确率能达到97%

5.11 测试11. 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) *3,运行10轮的效果如下:

5.12 测试12. 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) *2,运行10轮的效果如下:

标签: 
相关文章推荐