您的位置：首页 > 移动开发 > Objective-C

We don't need no bounding-boxes: Training object class detectors using only human verificatio

2016-05-24 11:23 525 查看

(个人感觉此篇论文的检测过程与RCNN类似，只是用检测之后的Human verification替换了初始的训练集Groundtruth标注。在开始时，对图像用Edgebox生成proposals，然后用CNN提取特征，训练SVM检测器，得到检测分值，每一张图像检测分值最高的作为检测结果。论文提出在检测后加了一个交互过程，用户需要对检测结果进行human verification过程，根据标注者verification的结果进行调整，对于positively
verification的bounding box作为正样本，利用negatively verification的bounding box减小搜索空间，如此多次循环迭代，形成一个完整的检测过程，当前后两次迭代循环的检测结果相同时结束。它的主要创新点就是增加了human verification，这样训练样本只需给出image-level级的label，不需要对训练图像进行groundtruth标注。所以论文提出的框架主要在三个阶段不断地交替重复:1.re-training object detector;2.re-localizing
the object in train images;3.human verification)

摘要：训练目标类别检测器通常大量的训练图片且图片中的目标用bounding boxes进行标注。然而，手工画bounding-boxes是非常耗费时间的。我们提出一个新的方案来训练目标检测器，只需要标注者(annotator)核实(verify)由学习算法自动生成的bounding-boxes。我们的方案在训不断地重复迭代re-training detector,re-localizing objects in training images和human verification(人工验证)这三个过程。我们使用verification信号来改进re-training，同时用来减小re-localisation的搜索空间。

1.引言

(1)目标检测中训练一个检测器通常需要大量的图像，而且每张训练图像中的目标通过手工bounding-boxes标注。

(2)Bounding-box标注是tedious，耗时和expensive。

(3)为了降低bounding-box标注的成本，主要有两种策略：第一种是以弱监督方式学习(learning in the weakly supervised setting)，即给定的只是表示哪些物体类别出现在图像中的labels。尽管这种策略much cheaper，但它生成的是低质量的检测器。第二种策略是主动学习(active learning)，需要标注者有选择对一个图像子集画bounding boxes。这种策略能够生成高质量的检测器，但仍要求标注很多bounding-boxes。

(4)本篇论文提出了一种训练目标检测器的新方案(new scheme)，这篇方法仅要求用户对算法算自动生成的boudning boxes进行确认verify,标注者(annotator)仅需确定一个bounding box是correct还是not correct。关键的是，回答这样的确认问题所需要时间比drawing the bounding box的时间少很多。

(5)给定一个具有image-level labels的训练图像集，updating ojbect detectors, re-localizing objects in the training images, querying humans for verification这三个过程不断地交替重复。在每一次迭代循环中，将verification signal用于两个方面：一方面用确认为positive的bounding box(positively verified bounding boxes)来更新目标检测器(object
detector)；另一方面我们发现那些被用户确认为incorrect的bounding boxes并不是无用的，它仍能够提供一些关于目标不在哪里的有用信息(provide valuable information about where the object is not)。根据这一观察发现，我们利用那些negatively verified bounding boxes(被确认为negative的bounding boxes)来减少后续迭代循环对可能的目标位置的搜索空间(reduce the search
space of possible object locations in subsequent iterations)。

2.相关工作

(1)弱监督目标定位WSOL(weakly-supervised object localization)：试图在弱监督下学习目标类别检测器。即给定训练图像，已知图像中包含某个目标类别的实例，但没有实例位置(but not their location)。弱监督目标定位的任务是定位训练图像中的目标，并且学习一个目标检测器来定位新图像中的实例(localizing instances in new images)。 WSOL弱监督目标定位通常概念化为多示例学习(Multiple instances learning)。图像treated
as bags of windows(instances)，A negative image contains only negative instances(一个负样本图像只包含negative instances), A positive image contains at least one positive instance,mixed in with a majority of negative ones(一个正样本图像至少包含一个positive instance，positive instance与多数negative
instance相混合)。WSOL的目标是找到true positives instance，从true positive instance中学习目标类别的分类器。所以通常两个阶段不断交替：(A)re-training the object detector given the current selection of positive instances;(B)re-localising instances in the positive images using the current object detector.

(2)Humans in the loop:人机协作在仅凭计算机视觉难以完成的任务中成功使用，如fine-grained visual recognition(细粒度视觉识别)、semi-supervised clustering(半监督聚类)、attribute-based image classification。

(3)Active Learning(主动学习)：主动学习在领域人员标注由学习器筛选出的认为最有信息的数据子集时迭代训练模型。

(4)Other ways to reduce annotation effort(其他降低标注成本的方法)

3.方法(Method)

(1)论文中，训练集只有图像级的labels(image level labels)，没有目标实例的位置标注。目标是在最小化标注成本的同时获取用bounding-boxes标注的目标实例(object instances)，且训练一个好的目标检测器。论文提出了一种框架，在这种框架下用户只需对学习算法自动生成的bounding boxes进行确认(verify)就可以。

(2)论文提出的框架在以下几个过程交替：

(重新训练目标检测器);

(重新定位训练图像中的目标)；

(询问标注者以确认)

如Figure1所示：

(3)形式化描述：

为第n次循环迭代中没有

的图像集合，

为相应的可能目标位置集合。初始时，

即为完整的训练集，

为完整从训练集图像提取的object
proposals集合(论文中使用文献[14]的Edgebox来生成object proposals)。

3.1 Verification by annotators

(1)在Verification阶段，要求标注者来确认自动生成的检测

。verification采用两种策略：a.简单的yes/no确认；b.对检测错误类型进行分类的elaborate
verification(精细确认)。

(2)Yes/No确认：在这个任务中，向标注者显示一个检测

和一个类别标记(a class label),如果这个检测结果准确定位了这个类别(class)的一个目标则回答yes(确认yes)，否则为No。这将目标检测结果

划分为

和

。Correct
Localization是基于IoU来定义的, 一个检测

的object bounding box，实际的目标

bounding
box，

，如果

，则检测结果是正确的,则应被确认为Yes。(有一个问题，标注并不知道gt，如何得到IoU，凭直观感觉确认吗？)

(3)Yes/Part/Container/Mixed/Missed确认：在这个任务中，要求标注者对检测结果进行确认，给出确认结果Yes(正确)或Part(部分)或Container(包含)或Mixed(混合)或Missed(没有检测目标)。Yes的要求是，对于不正确的检测，标注者要判断错误的类型：a.Part:如果检测结果只包含target object的一部分且没有背景(no background)；b.Container:如果检测结果包含了整个object和一部分背景(some background)；c.Mixed:如果检测结果包含target
object的一部分和一些背景(some background)；d.missed:如果检测结果中完全没有target object。这个确认步骤将检测结果集

划分为

和

。

3.2 Re-training the object detector

在这一步，我们重新训练object detector。通过verification step,

中包含的是正确定位的目标实例，而

或

则不包含定位正确的目标。所以我们训练时仅使用在前面几次循环迭代中被确认为positive的bounding-boxes

作为正样本。为了得到背景训练样本(background training samples)，从proposals中采样，将与positively verified
bounding boxes的IoU在[0-0.5)间的proposals作为背景训练样本。

3.3 Re-localizing objects by search space reduction(通过减小搜索空间来重新定位)

(1)在这一步，我们对训练图像中的目标重新定位。对于每个图像，我们对图像的object proposals运用当前的目标检测器得到object proposals的检测分值，并且选择具有最高检测分值(score)的proposal作为这个图像的检测结果。重要的是，并不是对所有的proposals

进行评价，而是使用verification signal来去除proposals(removing proposals)从而减小搜索空间。

(2)对于被肯定确认(positively verified)的检测结果

根据定义是正确定位的，所以它们对应的图像在后续的迭代中不需要考虑，在后续迭代的re-location
step和verification step都不再考虑。对于被否定确认(negatively verified)的检测结果

，我们根据确认策略来减小搜索空间。

(3)Yes/No 确认：在这种策略下如果标注者将一个检测结果确认为不正确incorrect()，我们就能够简单地将这个proposal从搜索空间中去除。这样得到了更新后的搜索空间

，其中每个具有不正确检测结果(incorrect
detection)的图像去除了one proposal。然而我们可以更好地使用negative verification signal。既然一个不正确的detection与true object bounding-box的IoU<0.5，我们可以去除所有与不正确detection的的proposals。

(4)Yes/Part/Container/Mixed/Missed Verification：在这种情况下标注者将错误的检测(incorrect detection)分为：Part/Container/Mixed/Missed，我们可以使用错误类型来进一步减小搜索空间。根据错误类型，去除不同的proposals：

* Container:消除所有不在这个检测中的proposals;

* Part:消除所有不包含这个检测的proposals；

* Mixed:消除所有不在这个检测里或不包含这个检测或与这个检测的IoU值为0或很高的proposals ;

* Missed:消除所有与这个检测具有非0 IoU值的proposals( )。