您的位置:首页 > 移动开发 > Objective-C

【Deep Learning】Review:Rich feature hierarchies for accurate object detection and semantic segmentati

2016-01-25 09:14 489 查看
Rich feature hierarchies for accurate object detection and semantic segmentation

Source: http://arxiv.org/abs/1311.2524 github: https://github.com/rbgirshick/rcnn 1. Summary of thePaper
Quoted from the paper:

“Object detection systemoverview. Our system (1) takes an input image, (2) extracts around 2000bottom-up region proposals, (3) computes features for each proposal using alarge convolutional neural network (CNN), and then (4) classifies each regionusing
class-specific linear SVMs. R-CNN achieves a mean average precision (mAP)of 53.7% on PASCAL VOC 2010. For comparison, [39] reports 35.1% mAP using thesame region proposals, but with a spatial pyramid and bag-of-visual-wordsapproach. The popular deformable
part models perform at 33.4%. On the 200-classILSVRC2013 detection dataset, R-CNN’s mAP is 31.4%, a large improvement overOverFeat [34], which had the previous best result at 24.3%.”


To sum up, theframework of the method provided by the paper is actually very simple buteffective:

l Replacethe sliding sampling method with selective search, extracting 2,000-regionproposal candidate.

l Traink L-SVM classifier to obtain the score of each region based on the output ofthe features from AlexNet.

l Gainthe detection results by abandoning some region in accordance with NMS.

2. MainContributions

1) Onecan apply high-capacity convolutional neural networks (CNNs) to bottom-upregion proposals in order to localize and segment objects.

2) Whenlabeled training data is scarce, supervised pre-training for an auxiliary taskfollowed by domain-specific fine-tuning, yields a significant performanceboost. Combining region proposal with CNN plays an outstanding performance.

3. Positive andnegative points

Positive Points:

(i) Replacetraditional sliding windows methods with region proposal method.

(ii) Use AlexNetto extract feature. Minimize the size of region to 227*227 so that we canprovide background information as prior information.

(iii) Use Boundary-boxregression to further promote the accuracy.

(iv) Replacesoftmax with SVM because the background shared in softmax and thus SVM providesmore independent information.

Negative Points:

(i) .

4. How strong isthe evaluation

l Performancelayer by layer without fine tuning. This is to say, by using the features ofpool5, fc6, fc7 for implementing SVM, the results are very similar. Theconclusion is CNN could mostly demonstrate the information in convolutionallayer.

l Comparisonto recent feature learning methods indicates CNN’s effect is generally betterthan the other methods’ performance.

l Comparedto Google Dean et al. paper (CVPR best paper): 16% mAP in 5 minutes. Here 48%in about 1 minute!

5. Possibledirection for the future work

I’mthinking is it possible that we have some methods that improve the speed andaccuracy of object detection without using region proposal. Although regionproposal largely promote the efficiency, it inevitably neglects some importantinformation.
Hope this could be possible direction.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: