您的位置:首页 > 其它

YOLO9000:Better, Faster, Stronger

2018-03-19 18:58 393 查看
推荐:

http://blog.csdn.net/qq_14845119/article/details/53589282

https://zhuanlan.zhihu.com/p/25052190

https://zhuanlan.zhihu.com/p/25167153?utm_source=wechat_session&utm_medium=social

Abstract

YOLO9000是一种实时检测系统,可检测9000中类别的物体。用一种新颖的多尺度训练方法,这一模型可以run at varying sizes,在速度和准确性方面达到权衡。作者提出一种方法在目标检测和分类上进行联合训练。

Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset.这一联合训练方法可以predict detections for object classes that don’t have labelled detection data.

1.Introduction

目的:We would like detection to scale to level of object classification.

但是,因为检测打标签很贵,所以不太可能see detection datasets on the same scale as classification datasets in the near future. Our method uses a hierarchical view of object classification that allows us to combine distinct datasets together.

贡献:We also propose a joint training algorithm that allows us to train object detectors on both detection and classification data.我们的方法用标注的检测图片来精准定位物体。用分类图片提升robustness。

核心:we use our dataset combination method and joint training algorithm to train a model。

2.Better

YOLO方法本身在定位方面性能表现不如Fast R-CNN,有一些localization errors。此外,与基于region roposal的方法相比,low recall。

所以,主要改进的点就是improving recall and localization while maintaining classification accuracy。

提高YOLO原始版本性能的方法:

Batch Normalization:

使用BN就不需要使用other forms of regularization,BN helps regularize the model。

添加位置:adding batch normalization on all of the convolutional layers。

有BN的情况下,we can remove dropout from the model without overfitting.

High Resolution Classifier:

在YOLOv2中,为使网络work better on higher resolution input,首先fine tune the classification network at the full 448 × 448 resolution for 10 epochs on ImageNet。then fine tune the resulting network on detection.

这种高分辨率的分类网络可以increase mAP。

Convolutional With Anchor Boxes:

在YOLO中,BBOX的坐标是directly using fully con-

nected layers on top of the convolutional feature extractor. 属于predicting coordinates directly。

在Faster R-CNN中,BBOX的坐标预测using hand-picked priors。只用卷积层,RPN网络predicts offsets and confidences for anchor boxes。因为预测层是卷积化的,所以RPN predicts these offsets at every location in a feature map. 好处是Predicting offsets instead of coordinates simplifies the problem and makes it easier for the network to learn.

在YOLOv2中,We remove the fully connected layers from YOLO and use anchor boxes to predict bounding boxes. 具体做法是:

First we eliminate one pooling layer to make the output of the network’s convolutional layers higher resolution.

此外,我们shrink the network to operate on 416 input images instead of 448×448.原因是we want an odd number of locations in our feature map so there is a single center cell.(下采样factor是32,so feature map is 13 * 13)

Following YOLO, the objectness prediction still predicts the IOU of the ground truth and the proposed box and the class predictions predict the conditional probability of that class given that there is an object.

使用anchor boxes, mAP下降了,但是recall却提升了,提升幅度比下降幅度大。

Dimension Clusters.

在把anchor box用到YOLO时,我们遇到 two issues:box dimensions are hand picked。为了网络能够学会合理的调整boxes,有好的检测效果,pick better priors很重要。

在YOLOv2中,Instead of choosing priors by hand, we run k-means clustering on the training set bounding boxes ,目的是自动找到good priors。即使用聚类的中心代替Anchor。

如果 use standard k-means with Euclidean distance larger boxes generate more error than smaller boxes.

想要的priors是可以产生good IOU scores的priors,这个IOU scores对于the size of the box无关。

使用Kmeans的目的:We run k-means clustering on the dimensions of bounding boxes to get good priors for our model.并且using k-means to generate our bounding box starts the model off with a better representation and makes the task easier to learn. Clustering gives much better results than using hand-picked priors.

3. Faster

4. Stronger

5. Conclusion

YOLO v2可以run at a variety of image sizes to provide a smooth tradeoff between speed and accuracy.

YOLO9000 is a real-time framework for detection more than 9000 object categories by jointly optimizing detection and classification.

使用WordTree to combine data from various sources and our joint optimization technique to train simultaneously on ImageNet and COCO.

这一强有力的步骤可以缩小检测和分类之间的gap。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: