READING NOTE: Learning Spatial Regularization with Image-level Supervisions for Multi-label ...
2017-07-06 18:38
676 查看
TITLE: Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification
AUTHOR: Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang
ASSOCIATION: University of Science and Technology of China, University of Sydney, The Chinese University of Hong Kong
FROM: arXiv:1702.05891
The proposed algorithm has great generalization capability and works well on data with different types of labels.
The scheme of SRN is illustrated in the following figure.
To train the network,
Finetune only the main net on the target dataset. Both fcnn and fcls are learned with cross-entropy loss for classification.
Fix fcnn and fcls. Train fatt and conv1 with cross-entropy loss for classification.
Train fsr with cross-entropy loss for classification by fixing all other sub-networks.
The whole network is jointly finetuned with joint loss.
The main network follows the structure of ResNet-101. And it is finetuned on the target dataset. The output of Attention Map and Confidence Map has C channels which is same with the number of categories. Their outputs are merged by element-wise multiplication and average-pooled to a feature vector in step 2. In step 3, instead of an average-pooling, fsr follows. fsr is implemented as three convolution layers with ReLU nonlinearity followed by one fully-connected layer as shown in the following figure.
conv4 is composed of single-channel filters. In Caffe, it can be implemnted using “group”. Such design is because one label may only semantically relate to a small number of other labels, and measuring spatial relations with those unrelated attention maps is unnecessary.
AUTHOR: Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang
ASSOCIATION: University of Science and Technology of China, University of Sydney, The Chinese University of Hong Kong
FROM: arXiv:1702.05891
CONTRIBUTIONS
An end-to-end deep neural network for multi-label image classification is proposed, which exploits both semantic and spatial relations of labels by training learnable convolutions on the attention maps of labels. Such relations are learned with only image-level supervisions. Investigation and visualization of learned models demonstrate that our model can effectively capture semantic and spatial relations of labels.The proposed algorithm has great generalization capability and works well on data with different types of labels.
METHOD
The proposed Spatial Regularization Net (SRN) takes visual features from the main net as inputs and learns to regularize spatial relations between labels. Such relations are exploited based on the learned attention maps for the multiple labels. Label confidences from both main net and SRN are aggregated to generate final classification confidences. The whole network is a unified framework and is trained in an end-to-end manner.The scheme of SRN is illustrated in the following figure.
To train the network,
Finetune only the main net on the target dataset. Both fcnn and fcls are learned with cross-entropy loss for classification.
Fix fcnn and fcls. Train fatt and conv1 with cross-entropy loss for classification.
Train fsr with cross-entropy loss for classification by fixing all other sub-networks.
The whole network is jointly finetuned with joint loss.
The main network follows the structure of ResNet-101. And it is finetuned on the target dataset. The output of Attention Map and Confidence Map has C channels which is same with the number of categories. Their outputs are merged by element-wise multiplication and average-pooled to a feature vector in step 2. In step 3, instead of an average-pooling, fsr follows. fsr is implemented as three convolution layers with ReLU nonlinearity followed by one fully-connected layer as shown in the following figure.
conv4 is composed of single-channel filters. In Caffe, it can be implemnted using “group”. Such design is because one label may only semantically relate to a small number of other labels, and measuring spatial relations with those unrelated attention maps is unnecessary.
相关文章推荐
- Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification
- Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification
- 论文阅读理解 - Learning Spatial Regularization for Multi-label Image Classification
- [Paper note] Joint Learning of Single-image and Cross-image Representations for Person Re-id.
- 多标签图像分类--HCP: A Flexible CNN Framework for Multi-Label Image Classification
- Projective Feature Learning for 3D Shapes with Multi-View Depth Images
- Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-ide
- Joint Patch and Multi-label Learning for Facial Action Unit Detection
- Multi-label learning for BP
- Note: Learningwithout Human Scores for Blind Image Quality Assessment
- Reading Note: ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- Multi-Task Learning with Low Rank Attribute Embedding for Person Re-identification
- BassNet:Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image ...
- 读后感 Spectral–Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Fr
- 论文笔记 | CNN-RNN:A Unified Framework for Multi-label Image Classification
- 论文笔记(一)Re-ranking by Multi-feature Fusion with Diffusion for Image Retrieval
- 论文复现报告:Deep Region and Multi-label Learning for Facial Action Unit Detection
- HCP: A Flexible CNN Framework for Multi-label Image Classification论文学习
- Deep Region and Multi-label Learning for Facial Action Unit Detection简要论文笔记
- Barcelona Dataset for Multi-label Image Annotation