您的位置：首页 > 其它

[深度学习论文笔记][Visualizing] Visualizing and Understanding Convolutional Networks

2016-10-24 16:54 726 查看

Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European Conference on Computer Vision. Springer International Publishing, 2014.(Citations: 1207).

Occlusion Experiments
Idea Occlude portions of the input image, revealing which parts of the scene are important for classification.

Method Occlude different portions of the input image with a grey square, and monitor the probability output of correct class of the classifier, plot as a function of the position of the grey square in the original image.

Result See Fig. 4.1. It can be seen that model is localizing the objects within the scene, as the probability of the correct class drops significantly when the object is occluded. In the third image, if we occlude the person’s head, the probability of the correct
class goes up.

Deconv Approach

DeconvNet
For the relu layer

The backward pass is

Method For each layer, random select a subset of feature maps. For each feature map, find the top 9 neurons that have the highest activations. Projecting each separately down to pixel space by deconvnet reveals the different structures that excite the a
given feature map.

Result Can be seen in Fig. 4.2, 4.3, 4.4. Alongside these visualizations we show the corresponding image patches.

• The the strong grouping within each feature map.

• Hierarchical nature of the features in the network (layer 2: corners and other edge/color conjunctions; layer 3: textures, mesh patterns (r1, c1), and text (r2, c4); layer 4: more class-specific, like dog faces (r1, c1) and bird’s legs (r4, c2); layer 5:
entire objects, like keyboards (r1, c11) and dogs (r4)).

• Greater invariance at higher layers.

• Exaggeration of discriminative parts of the image, e.g. eyes and noses of dogs (layer 4, r1, c1).

Feature Evolution During Training The lower layers of the model can be seen to converge within a few epochs. However, the upper layers only develop after a considerable number of epochs (40-50), demonstrating the need to let the models train until fully
converged.

Feature Invariance Small transformations have a dramatic effect in the first layer of the model, but a lesser impact at the top feature layer, being quasi-linear for translation and scaling. However, the output is not invariant to rotation.

References

[1]. M. Zeiler. https://www.youtube.com/watch?v=ghEmQSxT6tw.
[2]. F.-F. Li, A. Karpathy, and J. Johnson. http://cs231n.stanford.edu/slides/winter1516_lecture9.pdf.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： CNN Computer Vision Deep Learning Papers Understanding and Vi

相关文章推荐

新的分享

章节导航