READING NOTE: Face Detection with End-to-End Integration of a ConvNet and a 3D Model
2016-09-29 21:57
483 查看
TITLE: Face Detection with End-to-End Integration of a ConvNet and a 3D Model
AUTHOR: Yunzhu Li, Benyuan Sun, Tianfu Wu, Yizhou Wang
ASSOCIATION: Peking University, North Carolina State University
FROM: arXiv:1606.00850
It addresses two limitations in adapting the state-of-the-art faster-RCNN for face detection: eliminating the heuristic design of anchor boxes by leveraging a 3D model, and replacing the generic and predefined RoI pooling with a configuration pooling which exploits the underlying object structural configurations.
It obtains very competitive state-of-the-art performance in the FDDB and AFW benchmarks.
The input image is sent into a ConvNet, e.g. VGG, with an upsampling layer. Then the network will generate face proposals based on the score of summing the log probability of the keypoints, which is predicted by the predefined 3D face model.
some details
The loss of keypoint labels is defined as
Lcls(ω)=−12m∑i=12mlog(pxili)
where ω stands for the learnable weights of ConvNet, m is the number of the keypoints, and pxili is the probability of the point in location xi, which can be obtained by annotations, belongs to label li.
The loss of keypoit locations is defined as
Lptloc(ω)=1m2∑i=1m∑i=1m∑t∈{x,y}Smooth(ti−t^i,j)
where smooth(⋅) is the smooth l1 loss. For each ground-truth keypoint, we can generate a set of predicted keypoints based on the 3D face model and the 3D transformation parameters. If for each face we have m keypoints, then we will generate m sets of predicted keypoints. For each keypoint, m locations will be predicted.
The Configuration Pooling Layer is similar to the ROI Pooling Layer in faster-RCNN. Features are extracted based on the locations and relations of the keypoints, rather than based on the predefined perceptive field.
AUTHOR: Yunzhu Li, Benyuan Sun, Tianfu Wu, Yizhou Wang
ASSOCIATION: Peking University, North Carolina State University
FROM: arXiv:1606.00850
CONTRIBUTIONS
It presents a simple yet effective method to integrate a ConvNet and a 3D model in an end-to-end learning with multi-task loss used for face detection in the wild.It addresses two limitations in adapting the state-of-the-art faster-RCNN for face detection: eliminating the heuristic design of anchor boxes by leveraging a 3D model, and replacing the generic and predefined RoI pooling with a configuration pooling which exploits the underlying object structural configurations.
It obtains very competitive state-of-the-art performance in the FDDB and AFW benchmarks.
METHOD
The main scheme of inferring is shown in the following figure.The input image is sent into a ConvNet, e.g. VGG, with an upsampling layer. Then the network will generate face proposals based on the score of summing the log probability of the keypoints, which is predicted by the predefined 3D face model.
some details
The loss of keypoint labels is defined as
Lcls(ω)=−12m∑i=12mlog(pxili)
where ω stands for the learnable weights of ConvNet, m is the number of the keypoints, and pxili is the probability of the point in location xi, which can be obtained by annotations, belongs to label li.
The loss of keypoit locations is defined as
Lptloc(ω)=1m2∑i=1m∑i=1m∑t∈{x,y}Smooth(ti−t^i,j)
where smooth(⋅) is the smooth l1 loss. For each ground-truth keypoint, we can generate a set of predicted keypoints based on the 3D face model and the 3D transformation parameters. If for each face we have m keypoints, then we will generate m sets of predicted keypoints. For each keypoint, m locations will be predicted.
The Configuration Pooling Layer is similar to the ROI Pooling Layer in faster-RCNN. Features are extracted based on the locations and relations of the keypoints, rather than based on the predefined perceptive field.
相关文章推荐
- 人脸检测--Face Detection with End-to-End Integration of a ConvNet and a 3D Model
- How to train models of Object Detection with Discriminatively Trained Part Based Model
- Example Source Code: Using Face Detection To Create The Illusion Of Real 3D On iOS Devices
- How to train models of Object Detection with Discriminatively Trained Part Based Models
- A note to "On global motions of a compressible barotropic and selfgravitating gas with density-dependent viscosities"
- 车牌识别“Towards End-to-End Car License Plates Detection and Recognition with Deep Neural Networks”
- Towards End-to-End Car License Plates Detection and Recognition with Deep Neural Networks
- How to train models of Object Detection with Discriminatively Trained Part Based Models
- Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A
- 车牌检测识别--Towards End-to-End Car License Plates Detection and Recognition with Deep Neural Networks
- READING NOTE: A Pursuit of Temporal Accuracy in General Activity Detection
- How to train models of Object Detection with Discriminatively Trained Part Based Models
- READING NOTE: Object Detection from Video Tubelets with Convolutional Neural Networks
- 论文笔记(1)DenseBox: Unifying Landmark Localization with End to End Object Detection
- How to train models of Object Detection with Discriminatively Trained Part Based Models
- DenseBox: Unifying Landmark Localization with End to End Object Detection
- READING NOTE: Pushing the Limits of Deep CNNs for Pedestrian Detection
- End-to-end learning of action detection from frame glimpses in videos 阅读笔记
- 【论文笔记】An End-to-End Model for QA over KBs with Cross-Attention Combining Global Knowledge
- 论文阅读: End-to-end Learning of Action Detection from Frame Glimpses in Videos