您的位置：首页 > 编程语言

Faster RCNN原理及Pytorch代码解读——RoI Polling

2020-07-16 23:09 134 查看

前面我们花了很大的功夫介绍RPN的工作流程，然后我们得到了256个RoI，以及每一个RoI对应的类别与偏移量真值。

前面的VGGNet网络已经提供了整张图像的feature map，因此自然联想到可以利用此feature map，将每一个RoI区域对应的特征提取出来，然后接入一个全连接网络，分别预测其RoI的分类与偏移量。

然而，由于RoI是由各种大小宽高不同的Anchors经过偏移修正、筛选等过程生成的，因此其大小不一且带有浮点数，然而后续相连的全连接网络要求输入特征大小维度固定，这就需要有一个模块，能够把各种维度不同的RoI变换到维度相同的特征，以满足后续全连接网络的要求，于是这就是本篇要讲的RoI Pooling。
以VGG16作为特征提取器为例，我们前面已经知道了得到的特征图为512×37×50，然而后面使用的是VGGNet的全连接层，其所需的特征向量维度为512×7×7，可以看到目前的特征图通道数为512，Pooling的过程就是将每一个RoI区域对应的特征（这些RoI区域大小不一，只有通道数都是512）修整成7×7大小区域的特征。
对RoI进行池化有好几种算法，但是这里只介绍Faster RCNN使用到的池化方式，对于其他池化方式，如Mask RCNN中提到的RoI Align算法，感兴趣的可以自行百度搜索。

RoI Pooling

RoI Pooling的实现过程如下图（该图来自于深度学习之PyTorch物体检测实战一书）所示，假设当前的RoI为图中左侧图像的边框，大小为332×332，为了得到这个RoI的特征图，首先需要将该区域映射到全图的特征图上，由于下采样率为16，因此该区域在特征图上的坐标直接除以16并取整，而对应的大小为332/16=20.75。在此，RoI Pooling的做法是直接将浮点数量化为整数，取整为20×20，也就得到了该RoI的特征，即图中第3步的边框。

下一步还要将该20×20区域处理为7×7的特征，然而20/7≈2.857，再次出现浮点数，RoI Pooling的做法是再次量化取整，将2.857取整为2，然后以2为步长从左上角开始选取出7×7的区域，这样每个小方格在特征图上都对应2×2的大小，如图中第4步所示。
最后，取每个小方格内的最大特征值，作为这个小方格的输出，最终实现了7×7的输出，也完成了池化的过程，如图中第5步所示。
从实现过程中可以看到，RoI本来对应于20.75×20.75的特征图区域，最后只取了14×14的区域，因此RoI Pooling算法虽然简单，但量化取整带来的偏差势必会影响网络，尤其是回归物体位置的准确率。

代码

源码在ib/model/roi_pooling/modules/roi_pool.py下

class _RoIPooling(Module):
def __init__(self, pooled_height, pooled_width, spatial_scale):
super(_RoIPooling, self).__init__()

self.pooled_width = int(pooled_width)	# 池化后特征图的高，数值为7
self.pooled_height = int(pooled_height)	# 池化后特征图的宽，数值为7
self.spatial_scale = float(spatial_scale)	# 下采样倍数，数值为1/16

def forward(self, features, rois):
"""
调用RoI池化
features：特征提取器提取出来的特征图，shape（batch, 512，37,50）
RoI:最终候选区域，shape（batch, 256, 5）

return:池化到相同维度的特征，shape(batch, 256, 512, 7, 7)
"""
return RoIPoolFunction(self.pooled_height, self.pooled_width, self.spatial_scale)(features, rois)

源码在lib/model/roi_pooling/functions/roi_pool.py下

class RoIPoolFunction(Function):
def __init__(ctx, pooled_height, pooled_width, spatial_scale):
ctx.pooled_width = pooled_width 	# 池化后特征图的高，数值为7
ctx.pooled_height = pooled_height	# 池化后特征图的宽，数值为7
ctx.spatial_scale = spatial_scale	# 下采样倍数，数值为1/16
ctx.feature_size = None

def forward(ctx, features, rois):
ctx.feature_size = features.size()	# 特征图大小，（batch, 512，37,50）
batch_size, num_channels, data_height, data_width = ctx.feature_size
num_rois = rois.size(0)		# RoI数量，数值为256
output = features.new(num_rois, num_channels, ctx.pooled_height, ctx.pooled_width).zero_() 	# shape(256, 512, 7, 7)
ctx.argmax = features.new(num_rois, num_channels, ctx.pooled_height, ctx.pooled_width).zero_().int()	# shape(256, 512, 7, 7)
ctx.rois = rois
if not features.is_cuda:
_features = features.permute(0, 2, 3, 1)
roi_pooling.roi_pooling_forward(ctx.pooled_height, ctx.pooled_width, ctx.spatial_scale,
_features, rois, output)
else:
roi_pooling.roi_pooling_forward_cuda(ctx.pooled_height, ctx.pooled_width, ctx.spatial_scale,
features, rois, output, ctx.argmax)

return output

def backward(ctx, grad_output):
assert(ctx.feature_size is not None and grad_output.is_cuda)
batch_size, num_channels, data_height, data_width = ctx.feature_size
grad_input = grad_output.new(batch_size, num_channels, data_height, data_width).zero_()

roi_pooling.roi_pooling_backward_cuda(ctx.pooled_height, ctx.pooled_width, ctx.spatial_scale,
grad_output, ctx.rois, grad_input, ctx.argmax)

return grad_input, None

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航