您的位置：首页 > 其它

深度学习论文笔记：Faster R-CNN

2017-02-18 21:15 671 查看

Abstract

Region Proposal的计算是基于Region Proposal算法来假设物体位置的物体检测网络比如：SPPnet, Fast R-CNN运行时间的瓶颈。

Faster R-CNN引入了Region Proposal Network（RPN）来和检测网络共享整个图片的卷积网络特征，因此使得region proposal几乎是cost free的。

RPN->预测物体边界（object bounds）和在每一位置的分数（objectness score）

通过在一个网络中共享RPN和Fast R-CNN的卷积特征来融合两者——使用“attention”机制。

300 proposals pre image.

Introduction

RP是当前许多先进检测系统的瓶颈。

Region proposal methods:

Selective Search: one of the most popular method

EdgeBoxes: trade off between proposal quality and speed.

region proposal这一步依旧和检测网络花费同样多的时间。

Fast R-CNN生成的feature map 也能用来生成RP。在这些卷积特征之上我们通过这样的方式构建RPN：通过添加几个额外的卷积层来模拟一个regular grid上每一个位置的regress region bounds和objectness scores。所以RPN也是一种fully convolutional network(FCN)，从而可以端到端训练来产生detection proposals。

anchor boxes：references at multiple scales and aspect ratios. 我们的方法可以看成pyramid of regression reference，从而避免枚举多尺寸、多横纵比的images或者filters

Related Work

R-CNN主要是一个分类器，他不能预测object bounds，他的准确性依赖于Region proposal模块的表现

Faster R-CNN

由两个模块组成：

第一个模块：A deep fuuly convolutional network that proposes regions.

第二个模块：Fast R-CNN检测器

Attention mechanisms：RPN module告诉Fast R-CNN module 往哪里看（where to look）

Region Proposal Networks

输入：一张任意尺寸的图片

输出：一组矩形object proposal

A fully convolutional network

生成region proposal的思路：

each sliding window被映射成low-dimensional feature(ZF: 256-d, VGG: 512-d, 之后跟着ReLU层)

Anchors

在每个sliding-window位置，同时预测几个RP，设k为每个位置最大可能的proposal：

reg layer：有4k个输出

cls layer：有k个输出

k个proposal相对于k个参考框（reference boxes）而参数化，我们将参考框称为anchor。

一个anchor位于sliding window的中间，同时关联着一个scale和aspect ration。

Translation-Invariant Anchors(平移不变性)

如果移动了一张图像中的一个物体，这proposal应该也移动了，而且相同的函数可以预测出热议未知的proposal。MultiBox不具备如此功能

平移不变性可以减少模型大小。

Multi-Scale Anchor as Regression References

Two popular ways for multi-scale predictions:

第一种：based on image/feature pyramids, 如：DPM and CNN-based methods。图像被resized成不同尺寸，然后为每一种尺寸计算feature maps(HOG或者deep convolutional features)。这种方法比较费时。

第二种：use sliding windows of multiple scales (and/or aspect ratios) on the feature maps.——filters金字塔。第二种方法经常和第一种方法联合使用

本论文的方法：anchor金字塔——more cost-efficient，只依靠单尺寸的图像和feature map。

The design of multiscale anchors is a key **component for **sharing features without extra cost for addressing scales.

Loss Function

每个anchor有一个二元标签（是物体或者不是）

两类anchor是有正标签的：

anchor/anchors with highest IoU overlap with a ground-truth box

an anchor that has IoU overlap higher than 0.7 with any ground-truth box.

如果一个anchor和任何ground-truth boxes的IoU值小于0.3，那么该anchor为负标签

非正非负样本对training objective没有用。

Loss Function：

Bounding box regression

这个可以考虑为从anchor box回归到附近的ground truth box。

Training RPNs

image-centric sampling strategy

mini-batch: arises from a single image that contains many positive and negative example anchors.

随机在一张图片中采样256个anchors来计算一个mini-batch的loss function。正负anchors = 1:1.

all new layers的权值初始化：高斯分布(μ=0,σ=0.01)，all other layers（比如共享卷积层）用ImageNet来权值初始化。用ZF net来进行进行微调。

学习率：0.001(60k)->0.0001(20k)

动量：0.9

weight decay: 0.0005

Sharing Feature for RPN and Fast R-CNN

sharing convolutional layers between the two networks, rather than learning two separate networks

三种特征共享的方法：

Alternating training：迭代，先训练PRN，然后用proposal去训练Fast R-CNN。被Fast R-CNN微调的网络然后用来初始化PRN，以此迭代。本论文所有的实现都是使用该方法。

Approximate joint training：

RPN和Fast R-CNN融合到一个网络中进行训练。不考虑Bounding Boxes。

Non-approximate joint training: 考虑Bounding Boxes。

4-step Alternating Training:

Step 1: train the RPN, initialized with an ImageNet-pre-trained model and ﬁne-tuned end-to-end for the region proposal task.

Step 2: train a separate detection network by Fast R-CNN using the proposals generated by the step-1 RPN. 同样使用ImageNet-pre-trained model来初始化。此时两个网络并没有共享卷积层。

Step 3: use the detector network to initialize RPN training but we ﬁx the shared convolutional layers and only ﬁne-tune the layers unique to RPN. 现在两个网络共享卷积层

Step 4: keeping the shared convolutional layers ﬁxed, we ﬁne-tune the unique layers of Fast R-CNN.

Implementation Details

Multi-scale与speed-accuracy之间的trade-off

To reduce redundancy, we adopt non-maximum suppression (NMS) on the proposal regions based on their cls scores.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 深度学习目标检测 FasterRCNN

相关文章推荐

新的分享

章节导航