[深度学习论文笔记][Image Reconstruction] Understanding Deep Image Representations by Inverting Them
2016-10-31 10:07
501 查看
Mahendran, Aravindh, and Andrea Vedaldi. “Understanding deep image representations by inverting them.” 2015 IEEE conference on computer vision and pattern recognition
(CVPR). IEEE, 2015. (Citations: 142).
1 Motivation
Given an intermediate layer’s representation of an image, to which extent is it possible to reconstruct the image itself?
Note that the representations collapse irrelevant differences in images (e.g. illumination or viewpoint), so that the representation should not be uniquely invertible.
2 Idea
Starting from random noise, find an image whose representations best matches the given one.
Where A is the given representation, Â is the one of the image X. (X) is the regulariser
capturing a natural image prior.
Note after optimization, training set mean image should be added on X.
3 Regularisers
Discriminatively-trained representations may discard a significant amount of low-level image statistics as these are usually not interesting for high-level tasks.
3.1 p-norm
By choosing a relatively large exponent (p = 6 is used in the experiments) the range of the image is encouraged to stay within a target interval instead of diverging.
3.2 Total Variation (TV)
It encourages images to consist of piece-wise constant patches.
Choice of β
• When CNN has max pooling layer, β = 1 will lead to spikes in reconstruction since TV counts overall amount of intensity change.
• β < 1 will eacerbate this issue.
• β > 1 will remove skipes, but the image will be washed out as edges are penalized more.
4 Balancing the Different Terms
Divided the loss by ||A||^2 F will let it in the region of 0, 1).
5 Result
See Fig. All the convolutional layers maintain a photographically faithful representation of the image, although with increasing fuzziness. The fc layers are inverted back to a
composition of parts similar but not identical to the ones found in the original image. FC layers loses information about the location of the object, but retain information about some discriminative features of the objects.
(CVPR). IEEE, 2015. (Citations: 142).
1 Motivation
Given an intermediate layer’s representation of an image, to which extent is it possible to reconstruct the image itself?
Note that the representations collapse irrelevant differences in images (e.g. illumination or viewpoint), so that the representation should not be uniquely invertible.
2 Idea
Starting from random noise, find an image whose representations best matches the given one.
Where A is the given representation, Â is the one of the image X. (X) is the regulariser
capturing a natural image prior.
Note after optimization, training set mean image should be added on X.
3 Regularisers
Discriminatively-trained representations may discard a significant amount of low-level image statistics as these are usually not interesting for high-level tasks.
3.1 p-norm
By choosing a relatively large exponent (p = 6 is used in the experiments) the range of the image is encouraged to stay within a target interval instead of diverging.
3.2 Total Variation (TV)
It encourages images to consist of piece-wise constant patches.
Choice of β
• When CNN has max pooling layer, β = 1 will lead to spikes in reconstruction since TV counts overall amount of intensity change.
• β < 1 will eacerbate this issue.
• β > 1 will remove skipes, but the image will be washed out as edges are penalized more.
4 Balancing the Different Terms
Divided the loss by ||A||^2 F will let it in the region of 0, 1).
5 Result
See Fig. All the convolutional layers maintain a photographically faithful representation of the image, although with increasing fuzziness. The fc layers are inverted back to a
composition of parts similar but not identical to the ones found in the original image. FC layers loses information about the location of the object, but retain information about some discriminative features of the objects.
相关文章推荐
- Understanding Deep Image Representations by Inverting Them
- [深度学习论文笔记][Image to Sentence Generation] Deep Visual-Semantic Alignments for Generating Image Descri
- [深度学习论文笔记][Visualizing] Understanding Neural Networks Through Deep Visualization
- [深度学习论文笔记][Depth Estimation] Depth Map Prediction from a Single Image using a Multi-Scale Deep Netw
- [深度学习论文笔记][Image Classification] Very Deep Convolutional Networks for Large-Scale Image Recognitio
- [深度学习论文笔记][Image Classification] ImageNet Classification with Deep Convolutional Neural Networks
- 深度学习论文笔记 [图像处理] Deep Residual Learning for Image Recognition
- [深度学习论文笔记][Weight Initialization] Batch Normalization: Accelerating Deep Network Training by Reducin
- [深度学习论文笔记][Visualizing] Deep Inside Convolutional Networks Visualising Image Classification
- [深度学习论文笔记][Weight Initialization] Understanding the difficulty of training deep feedforward neural
- [深度学习论文笔记][Image Classification] Deep Residual Learning for Image Recognition
- 深度学习论文笔记--Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
- 深度学习入门笔记:Fast Image Search with Deep Convolutional Neural Networks and Efficient Hashing Codes
- [深度学习论文笔记][Weight Initialization] Exact solutions to the nonlinear dynamics of learning in deep lin
- 深度学习论文笔记-Deep Learning Face Representation from Predicting 10,000 Classes
- [深度学习论文笔记][Visualizing] Visualizing and Understanding Convolutional Networks
- Joint Deep Learning For Pedestrian Detection(论文笔记-深度学习:行人检测)
- 深度学习论文阅读笔记--Deep Learning Face Representation from Predicting 10,000 Classes
- 【深度学习】论文导读:google的批正则方法(Batch Normalization: Accelerating Deep Network Training by Reducing...)
- 【深度学习】论文导读:图像识别中的深度残差网络(Deep Residual Learning for Image Recognition)