百度基于Paddle深度学习基础第二周实践作业完成过程
百度基于Paddle深度学习基础第二周实践作业完成过程
1. 作业要求
本代码旨在于使用ResNet进行眼睑筛查,代码已经完成,可以直接运行。
题目要求:
- 通过查阅API,使用衰减学习率,通过多次调参数,找到一个最佳的衰减步长,使得loss比原代码中下降的更快
- 请自行绘制修改学习率前后的loss衰减图
注意:
- 原代码中仅需要更改学习率部分
- 若loss下降效果不明显,可自行调大epoch_num至10
2. 总体说明:
程序做了如下修改,并在程序中做了对应的标注:
1) 修改了 epoch_num 为 10,增加运行轮数
2) 定义学习率变量,使用 paddle API,使学习自动衰减
3) 定义 losses 集合与 iters 集合,方便绘制 loss 图形
4) 定义了 绘制 loss 图的函数,方便调用与图形绘制
2.1 添加的代码如下:
# 4. 绘制 loss 变化情况 # 【参考: 课本2,项目8: 可视化分析】 # 4.1. 引入绘图库 import matplotlib.pyplot as plt # 在 jupyter 中能绘制图形 %matplotlib inline # 4.2. 定义绘制 loss 变化曲线的函数 ''' @param iters: 横坐标 losses_train: 训练losses ''' def plot_change_loss(iters, losses_train): #画出训练过程中Loss的变化曲线 plt.figure() plt.title("train loss", fontsize=24) plt.xlabel("iter", fontsize=14) plt.ylabel("loss", fontsize=14) plt.plot(iters, losses_train,color='red',label='train loss') plt.grid() plt.show()
2.2 修改训练过程代码如下:
# 定义训练过程 def train(model): with fluid.dygraph.guard(): print('start training ... ') model.train() epoch_num = 10 # 1.修改这里的epoch_num 由1修改为10 # 2.1. 定义学习率,并加载优化器参数到模型中,【参考 课节9,项目9. 模型加载及恢复训练-->可视化分析】 # 眼疾数据,包含1200个受试者的眼底视网膜图片,训练、验证和测试数据集各400张 BATCH_SIZE = 10 # total_steps = (int(400//BATCH_SIZE) + 1) * epoch_num total_steps = (int(400//BATCH_SIZE) + 1) * 2.5 lr = fluid.dygraph.PolynomialDecay(0.005, total_steps, 0.001) # 3.1. 添加迭代记数器及迭代记数列表、loss列表 以方便绘制图形 iter_count = 0 iters = [] losses_train = [] # 训练的loss # 定义优化器 # opt = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9, parameter_list=model.parameters()) # 2.2. 将固定学习率,修改为动态学习率 opt = fluid.optimizer.Momentum(learning_rate=lr, momentum=0.9, parameter_list=model.parameters()) # 定义数据读取器,训练数据读取器和验证数据读取器 train_loader = data_loader(DATADIR, batch_size=10, mode='train') valid_loader = valid_data_loader(DATADIR2, CSVFILE) for epoch in range(epoch_num): for batch_id, data in enumerate(train_loader()): x_data, y_data = data img = fluid.dygraph.to_variable(x_data) label = fluid.dygraph.to_variable(y_data) # 运行模型前向计算,得到预测值 logits = model(img) # 进行loss计算 loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label) avg_loss = fluid.layers.mean(loss) if batch_id % 10 == 0: print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy())) # 3.2. 添加迭代次数记数并填充集合 iters.append(iter_count) losses_train.append(avg_loss.numpy()) iter_count += 10 # 反向传播,更新权重,清除梯度 avg_loss.backward() opt.minimize(avg_loss) model.clear_gradients() model.eval() accuracies = [] losses = [] for batch_id, data in enumerate(valid_loader()): x_data, y_data = data img = fluid.dygraph.to_variable(x_data) label = fluid.dygraph.to_variable(y_data) # 运行模型前向计算,得到预测值 logits = model(img) # 二分类,sigmoid计算后的结果以0.5为阈值分两个类别 # 计算sigmoid后的预测概率,进行loss计算 pred = fluid.layers.sigmoid(logits) loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label) # 计算预测概率小于0.5的类别 pred2 = pred * (-1.0) + 1.0 # 得到两个类别的预测概率,并沿第一个维度级联 pred = fluid.layers.concat([pred2, pred], axis=1) acc = fluid.layers.accuracy(pred, fluid.layers.cast(label, dtype='int64')) accuracies.append(acc.numpy()) losses.append(loss.numpy()) print("[validation] accuracy/loss: {}/{}".format(np.mean(accuracies), np.mean(losses))) model.train() # save params of model fluid.save_dygraph(model.state_dict(), 'palm') # save optimizer state fluid.save_dygraph(opt.state_dict(), 'palm') # 4.3. 调用绘制 loss 变化曲线函数进行绘制 plot_change_loss(iters, losses_train)
3. 小结
1) 通过测试,发现每次的运行过程特别的漫长,稍微修改一点内容,就要等待很长的时间,感觉算力真的很重要。
2) 查 paddle API 和查以前的课程内容相结合,很多相关的内容和设计思路,老师在前面的课程中已经介绍了,或在前面的课程的程序中已经做了演示,如果单独看 API 的例子,不确定怎么写程序的话,可以参考老师在前面课程中已经写完类似的代码去实现自己的目标。
3) 因为随机批次读取数据,所以相同的参数不同的运行,也会得到不同的运行效果。
4. 下一节各个测试用例的运行效果的说明:
完成程序的修改后,做了如下的运行测试,每次运行程序大概 45 分钟,有些记录了详细的 过程,有些只记录了最终的 loss 图形
1) (其中测试1)是在固定学习率0.001(也就是老师程序的学习率)下,运行了程序,发现准确率能达到 93%;
2) (测试2-6)是修改了不同的学习率的衰减范围,运行了程序,准确率最高能达到 95-96%;
3) (测试7之后)是修改学习率的衰减的步数(思路是先使用较大的步数,然后使用较小的步数,查看它们的效果,然后不断使用2分法,向中间查找对应频数的运行效果), 修改参数运行了程序后,发现(8 的测试步数)loss下降较快,因此采用的步数为 total_steps = (int(400//BATCH_SIZE) + 1) * 2.5,学习率从0.005 到 0.001, 准确率最高能达到:95-96%,
5. 测试过程
5.1 测试1. 未修改学习率时, 运行10轮的效果如下:
运行时长: 44分44秒478毫秒 结束时间: 2020-08-22 08:54:13 start training ... epoch: 0, batch_id: 0, loss is: [0.61399436] epoch: 0, batch_id: 10, loss is: [0.5397954] epoch: 0, batch_id: 20, loss is: [0.6404487] epoch: 0, batch_id: 30, loss is: [0.6857702] [validation] accuracy/loss: 0.7849999666213989/0.48529601097106934 epoch: 1, batch_id: 0, loss is: [0.6386715] epoch: 1, batch_id: 10, loss is: [0.4865201] epoch: 1, batch_id: 20, loss is: [0.50081044] epoch: 1, batch_id: 30, loss is: [0.34162915] [validation] accuracy/loss: 0.7024999856948853/0.6229860186576843 epoch: 2, batch_id: 0, loss is: [0.25662675] epoch: 2, batch_id: 10, loss is: [1.7949547] epoch: 2, batch_id: 20, loss is: [0.19568667] epoch: 2, batch_id: 30, loss is: [0.19662617] [validation] accuracy/loss: 0.9149999618530273/0.24236293137073517 epoch: 3, batch_id: 0, loss is: [0.84582233] epoch: 3, batch_id: 10, loss is: [0.12374055] epoch: 3, batch_id: 20, loss is: [0.39764705] epoch: 3, batch_id: 30, loss is: [0.2201365] [validation] accuracy/loss: 0.862500011920929/0.3080523610115051 epoch: 4, batch_id: 0, loss is: [0.11742544] epoch: 4, batch_id: 10, loss is: [0.33280876] epoch: 4, batch_id: 20, loss is: [0.13732623] epoch: 4, batch_id: 30, loss is: [1.1103892] [validation] accuracy/loss: 0.8575000762939453/0.3615248501300812 epoch: 5, batch_id: 0, loss is: [0.13193114] epoch: 5, batch_id: 10, loss is: [0.5138872] epoch: 5, batch_id: 20, loss is: [0.3979571] epoch: 5, batch_id: 30, loss is: [0.42524424] [validation] accuracy/loss: 0.9350000619888306/0.18516869843006134 epoch: 6, batch_id: 0, loss is: [0.21446273] epoch: 6, batch_id: 10, loss is: [0.29208523] epoch: 6, batch_id: 20, loss is: [0.71075696] epoch: 6, batch_id: 30, loss is: [0.16396093] [validation] accuracy/loss: 0.9299999475479126/0.2387014776468277 epoch: 7, batch_id: 0, loss is: [0.31918693] epoch: 7, batch_id: 10, loss is: [0.05028909] epoch: 7, batch_id: 20, loss is: [0.16989382] epoch: 7, batch_id: 30, loss is: [0.13365436] [validation] accuracy/loss: 0.9350000619888306/0.2305712103843689 epoch: 8, batch_id: 0, loss is: [0.13404241] epoch: 8, batch_id: 10, loss is: [0.14186636] epoch: 8, batch_id: 20, loss is: [0.03710499] epoch: 8, batch_id: 30, loss is: [0.12268938] [validation] accuracy/loss: 0.9325000047683716/0.19645212590694427 epoch: 9, batch_id: 0, loss is: [0.23441052] epoch: 9, batch_id: 10, loss is: [0.07912876] epoch: 9, batch_id: 20, loss is: [0.29366916] epoch: 9, batch_id: 30, loss is: [0.34971437] [validation] accuracy/loss: 0.9325000643730164/0.19805531203746796
5.2 测试2. 动态衰减学习率0.01-->0.001时的运行2轮的效果如下:
运行2轮的情况: 运行时长: 8分58秒267毫秒 结束时间: 2020-08-21 23:48:37 start training ... epoch: 0, batch_id: 0, loss is: [0.84265745] epoch: 0, batch_id: 10, loss is: [1.1629268] epoch: 0, batch_id: 20, loss is: [0.529528] epoch: 0, batch_id: 30, loss is: [0.69051445] [validation] accuracy/loss: 0.7024999856948853/2.064648389816284 epoch: 1, batch_id: 0, loss is: [0.16901067] epoch: 1, batch_id: 10, loss is: [0.32736033] epoch: 1, batch_id: 20, loss is: [0.3370896] epoch: 1, batch_id: 30, loss is: [0.01843431] [validation] accuracy/loss: 0.8575000762939453/0.371820330619812
5.3 测试3. 动态衰减学习率0.01-->0.001,运行10轮的效果如下:
运行时长: 44分30秒125毫秒 结束时间: 2020-08-22 00:35:34 start training ... epoch: 0, batch_id: 0, loss is: [0.7350406] epoch: 0, batch_id: 10, loss is: [1.5526597] epoch: 0, batch_id: 20, loss is: [1.4809185] epoch: 0, batch_id: 30, loss is: [1.2971458] [validation] accuracy/loss: 0.4749999940395355/3.1085333824157715 epoch: 1, batch_id: 0, loss is: [1.6478646] epoch: 1, batch_id: 10, loss is: [0.87493515] epoch: 1, batch_id: 20, loss is: [0.433599] epoch: 1, batch_id: 30, loss is: [0.1780614] [validation] accuracy/loss: 0.7949999570846558/0.5167998671531677 epoch: 2, batch_id: 0, loss is: [0.08465642] epoch: 2, batch_id: 10, loss is: [0.7266342] epoch: 2, batch_id: 20, loss is: [0.05047063] epoch: 2, batch_id: 30, loss is: [0.09345372] [validation] accuracy/loss: 0.887499988079071/0.6409258842468262 epoch: 3, batch_id: 0, loss is: [0.27393234] epoch: 3, batch_id: 10, loss is: [2.053777] epoch: 3, batch_id: 20, loss is: [0.07622384] epoch: 3, batch_id: 30, loss is: [0.28289387] [validation] accuracy/loss: 0.9050000309944153/0.33014366030693054 epoch: 4, batch_id: 0, loss is: [0.8107643] epoch: 4, batch_id: 10, loss is: [4.6561575] epoch: 4, batch_id: 20, loss is: [0.4564219] epoch: 4, batch_id: 30, loss is: [0.35920316] [validation] accuracy/loss: 0.9149999618530273/0.20150217413902283 epoch: 5, batch_id: 0, loss is: [0.5040611] epoch: 5, batch_id: 10, loss is: [0.24063113] epoch: 5, batch_id: 20, loss is: [0.5784434] epoch: 5, batch_id: 30, loss is: [0.11537111] [validation] accuracy/loss: 0.8899999856948853/0.3055678606033325 epoch: 6, batch_id: 0, loss is: [0.32148445] epoch: 6, batch_id: 10, loss is: [0.10687976] epoch: 6, batch_id: 20, loss is: [0.01122489] epoch: 6, batch_id: 30, loss is: [0.69238436] [validation] accuracy/loss: 0.9424999952316284/0.2373284101486206 epoch: 7, batch_id: 0, loss is: [0.22534744] epoch: 7, batch_id: 10, loss is: [0.21905744] epoch: 7, batch_id: 20, loss is: [0.24537715] epoch: 7, batch_id: 30, loss is: [0.02578714] [validation] accuracy/loss: 0.9624999761581421/0.18944025039672852 epoch: 8, batch_id: 0, loss is: [0.06122933] epoch: 8, batch_id: 10, loss is: [1.0541042] epoch: 8, batch_id: 20, loss is: [0.66766155] epoch: 8, batch_id: 30, loss is: [0.03150726] [validation] accuracy/loss: 0.9475000500679016/0.16870255768299103 epoch: 9, batch_id: 0, loss is: [0.03649266] epoch: 9, batch_id: 10, loss is: [0.10476176] epoch: 9, batch_id: 20, loss is: [0.10077281] epoch: 9, batch_id: 30, loss is: [0.16713764] [validation] accuracy/loss: 0.9524999856948853/0.1519453078508377
5.3.1 测试3.1. 动态衰减学习率0.05-->0.001,运行10轮的效果如下:
5.4 测试4. 动态衰减学习率0.1-->0.001,运行10轮的效果如下:
5.5 测试5. 动态衰减学习率0.005-->0.001,运行10轮的效果如下:
5.6 测试6. 动态衰减学习率0.005-->0.0005,运行10轮的效果如下:
但是运行很不稳定,忽高忽低
上面的学习率衰减步数为:
total_steps = (int(400//BATCH_SIZE) + 1) * epoch_num
基于 BATCH_SIZE=10, epoch_num = 10
,下面修改衰减步数:
5.7 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) ,运行10轮的效果如下:
5.8 测试8. 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) *2.5,运行10轮的效果如下:
这个的效果比较好
再次运行时的效果如下:
5.9 测试9. 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) *5,运行10轮的效果如下:
5.10 测试10. 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) *3.5,运行10轮的效果如下:
验证集的准确率能达到97%
5.11 测试11. 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) *3,运行10轮的效果如下:
5.12 测试12. 动态衰减学习率0.005-->0.001,步数为 total_steps = (int(400//BATCH_SIZE) + 1) *2,运行10轮的效果如下:
- PaddlePaddle-百度架构师手把手带你零基础实践深度学习-笔记02(代码详解)
- 飞浆PaddlePaddle-百度架构师手把手带你零基础实践深度学习(学习笔记)
- Paddle零基础深度学习-第1周实践作业
- PaddlePaddle-百度架构师手把手带你零基础实践深度学习-笔记01(代码详解)
- 飞桨PaddlePaddle百度架构师手把手带你零基础实践深度学习 第一周心得
- 飞桨百度架构师手把手带你零基础实践深度学习——手写数字识别损失函数的优化
- 基于PaddlePaddle的图像分类实战 | 深度学习基础任务教程系列(一)
- 飞桨 百度架构师手把手带你零基础实践深度学习21天 学习笔记 ——随机梯度法的实现
- 1机器学习和深度学习综述(百度架构师手把手带你零基础实践深度学习原版笔记系列)
- 飞桨 百度架构师手把手带你零基础实践深度学习21天 学习笔记——零基础入门
- 基于paddle的数据可视化以及paddlepaddle安装 百度深度学习7日—Day01
- 飞桨 百度架构师手把手带你零基础实践深度学习21天 学习笔记——使用飞桨重写波士顿房价预测模型
- 飞桨—百度架构师手把手带你零基础实践深度学习21天 学习笔记——使用飞桨进行深度学习的优势
- 飞桨百度架构师手把手带你零基础实践深度学习——手写数字识别 资源配置
- 第一课_神经网络和深度学习_第二周_神经网络基础 笔记及作业 ———— AndrewNg
- 基于PaddlePaddle的图像分类实战 | 深度学习基础任务教程系列(一)
- 【深度学习】 第二周课程作业实践
- 基于PaddlePaddle的机器翻译教程 | 深度学习基础任务系列
- 飞桨百度架构师手把手带你零基础实践深度学习——手写数字识别训练调试与优化
- 飞桨百度架构师手把手带你零基础实践深度学习——手写数字识别网络结构的优化