[深度学习论文笔记][Visualizing] Visualizing and Understanding Convolutional Networks
2016-10-24 16:54
726 查看
Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European Conference on Computer Vision. Springer International Publishing, 2014.(Citations: 1207).
Occlusion Experiments
Idea Occlude portions of the input image, revealing which parts of the scene are important for classification.
Method Occlude different portions of the input image with a grey square, and monitor the probability output of correct class of the classifier, plot as a function of the position of the grey square in the original image.
Result See Fig. 4.1. It can be seen that model is localizing the objects within the scene, as the probability of the correct class drops significantly when the object is occluded. In the third image, if we occlude the person’s head, the probability of the correct
class goes up.
Deconv Approach
DeconvNet
For the relu layer
The backward pass is
Method For each layer, random select a subset of feature maps. For each feature map, find the top 9 neurons that have the highest activations. Projecting each separately down to pixel space by deconvnet reveals the different structures that excite the a
given feature map.
Result Can be seen in Fig. 4.2, 4.3, 4.4. Alongside these visualizations we show the corresponding image patches.
• The the strong grouping within each feature map.
• Hierarchical nature of the features in the network (layer 2: corners and other edge/color conjunctions; layer 3: textures, mesh patterns (r1, c1), and text (r2, c4); layer 4: more class-specific, like dog faces (r1, c1) and bird’s legs (r4, c2); layer 5:
entire objects, like keyboards (r1, c11) and dogs (r4)).
• Greater invariance at higher layers.
• Exaggeration of discriminative parts of the image, e.g. eyes and noses of dogs (layer 4, r1, c1).
Feature Evolution During Training The lower layers of the model can be seen to converge within a few epochs. However, the upper layers only develop after a considerable number of epochs (40-50), demonstrating the need to let the models train until fully
converged.
Feature Invariance Small transformations have a dramatic effect in the first layer of the model, but a lesser impact at the top feature layer, being quasi-linear for translation and scaling. However, the output is not invariant to rotation.
References
[1]. M. Zeiler. https://www.youtube.com/watch?v=ghEmQSxT6tw.
[2]. F.-F. Li, A. Karpathy, and J. Johnson. http://cs231n.stanford.edu/slides/winter1516_lecture9.pdf.
Occlusion Experiments
Idea Occlude portions of the input image, revealing which parts of the scene are important for classification.
Method Occlude different portions of the input image with a grey square, and monitor the probability output of correct class of the classifier, plot as a function of the position of the grey square in the original image.
Result See Fig. 4.1. It can be seen that model is localizing the objects within the scene, as the probability of the correct class drops significantly when the object is occluded. In the third image, if we occlude the person’s head, the probability of the correct
class goes up.
Deconv Approach
DeconvNet
For the relu layer
The backward pass is
Method For each layer, random select a subset of feature maps. For each feature map, find the top 9 neurons that have the highest activations. Projecting each separately down to pixel space by deconvnet reveals the different structures that excite the a
given feature map.
Result Can be seen in Fig. 4.2, 4.3, 4.4. Alongside these visualizations we show the corresponding image patches.
• The the strong grouping within each feature map.
• Hierarchical nature of the features in the network (layer 2: corners and other edge/color conjunctions; layer 3: textures, mesh patterns (r1, c1), and text (r2, c4); layer 4: more class-specific, like dog faces (r1, c1) and bird’s legs (r4, c2); layer 5:
entire objects, like keyboards (r1, c11) and dogs (r4)).
• Greater invariance at higher layers.
• Exaggeration of discriminative parts of the image, e.g. eyes and noses of dogs (layer 4, r1, c1).
Feature Evolution During Training The lower layers of the model can be seen to converge within a few epochs. However, the upper layers only develop after a considerable number of epochs (40-50), demonstrating the need to let the models train until fully
converged.
Feature Invariance Small transformations have a dramatic effect in the first layer of the model, but a lesser impact at the top feature layer, being quasi-linear for translation and scaling. However, the output is not invariant to rotation.
References
[1]. M. Zeiler. https://www.youtube.com/watch?v=ghEmQSxT6tw.
[2]. F.-F. Li, A. Karpathy, and J. Johnson. http://cs231n.stanford.edu/slides/winter1516_lecture9.pdf.
相关文章推荐
- [深度学习论文笔记][Recurrent Neural Networks] Visualizing and Understanding Recurrent Networks
- [深度学习]Visualizing and Understanding Convolutional Networks阅读笔记
- 【论文阅读笔记】Visualizing and Understanding Convolutional Networks
- 深度学习研究理解5:Visualizing and Understanding Convolutional Networks(转)
- 论文笔记 Visualizing and understanding convolutional networks
- [深度学习论文笔记][Visualizing] Understanding Neural Networks Through Deep Visualization
- 深度学习研究理解5:Visualizing and Understanding Convolutional Networks
- [深度学习论文笔记][Visualizing] Deep Inside Convolutional Networks Visualising Image Classification
- [深度学习论文笔记] Convolutional Neuron Networks and its Applications
- Visualizing and Understanding Convolutional Networks论文笔记
- 深度学习论文笔记(六)--- FCN-2015年(Fully Convolutional Networks for Semantic Segmentation)
- 深度学习论文笔记:Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- 深度学习论文笔记-Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- 论文提要“Visualizing and Understanding Convolutional Networks”
- Visualizing and Understanding Convolutional Networks笔记
- 论文Visualizing and Understanding Convolutional Networks
- Visualizing and Understanding Convolutional Networks阅读笔记
- [深度学习论文笔记][ICCV 17 oral]Binarized Convolutional Landmark Localizers for Human Pose Estimation and...
- [深度学习论文笔记][Image Classification] Very Deep Convolutional Networks for Large-Scale Image Recognitio