论文笔记 | Going deeper with convolutions
2016-06-27 16:12
681 查看
Authors
Christian Szegedy Wei Liu Yangqing Jia Pierre Sermanet Scott Reed Dragomir Anguelov Dumitru Erhan Vincent Vanhoucke Andrew RabinovichChristian Szegedy
3 Motivation and high level considerations
Bigger size (depth: number of levels, width: number of units at each level) has two main draw back:1. A larger number of parameters: more prone to overfitting( a major bottleneck).
2. Dramatically increased use of computational resources.
The fundamental way of solving both issues would be by ultimately moving from full connected to sparsely connected architectures, even inside the convolutions. Bsesides mimicking biological system, this would also have the advantage of firmer theoretical underpinnings due to the groundbreaking work of Arora:
Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. Provable bounds for learning some deep representations. CoRR, abs/1310.6343, 2013.
Their main result states that if the probability distribution of the data-set is representable by a large, very sparse deep neural network, then the optimal network topology can be constructed layer by layer by analyzing the correlation statistics of the activations of the last layer and clustering neurons with hightly coorelated outputs. This statement resonates with the well known Hebbian principle- neurons that fire together, wire together- suggests that the underlying idea is applicable even under less strict conditions, in practice.
http://wenku.baidu.com/link?url=I2V5PaiYh5pziD8kwE6AYMnqQOenj08SwJx0_A1udOh9Vlsv6yGfR8otU3-Nw-oF0EMNG3MoOueaP8hOBFRxZKhpG0lFMKiBhC3afmU7uPC
4 Architectural Details
The main idea of the Inception architecure is based on finding out how an optimal local sparse structure in a convolutional vision network can be approximated and covered by readliy available dense components. Note that assuming translation invariance means that our network will be built from convolutional building blocks. We can assume that each unit from the earlier layer corresponds to some region of the input image and these units are grouped into filter banks, we can also expect that there will be a smaller number of more spatially spread out clusters that can be covered by convolutions over larger patches, and there will be a decreaseing number of patches over larger and larger regions. For convenience, Inception architecture are restricted to filter sizes1x1,3x3,5x5, Additionally, since pooling operatoins have been essential for the success in current state of the art convlutional networks, so we add an pooling layer in each stage.One big problem with the above modules is that even a modest number of 5x5 convolutions can be prohibitively expensive on top of a convolutional layer with a large number of filters.
This lead to the second idea of the proposed architecture:
1x1 convolutions besides being used as reductions, they also include the use of rectified linear activation which makes them dual-purpose.
For technical reasons(memory efficiency during training), it seemed beneficial to start using Inception modules onely at higher layers.
Beneficial aspects:
1. allow for increasing the units, without uncontrolled blow-up in computational complexity;
2. aligns with the intuition that visual information should be processed at various scales and then aggregated.
It was found that a move from fully connected layers to average pooling improved the top-1 accuracy by about 0.6%, however the use of dropout remained essential even after removing the fully connected layers.
How to keep the ability to propagate gradients back through all the layers in an effective manner?
By adding auxiliary classifiers connected to these intermediate layers, we would expect to1) encorage discrimination in the lower stages in the classifer,2) increase the gradient signal that gets propagated back, and3) provide additional regularization.
Those classifiers take the form of smaller convolutional networks put on top of the outpt of the Inception and modules. During training, their loss gets added to the total loss of the network with a discount weight( the losses of the auxiliary classifiers were weighted by 0.3). At inference time, these auxiliary networks are discarded.
Structure of the network:
1. An average pooling layer with 5x5 filter size and stride3, resulting in an 4x4x512 output for the 4a, and 4x4x528 for the 4d stage.
2. A 1x1 convolution with 128 filters for dimension reduction and rectified linear activation.
3. A fully connected layer with 1024 units and recified linear activation.
4. A dropout layer with 70% ratio of dropped outputs.
5. A linear layer with softmax loss as the classifier
6 Training Methodolgy
Our networks were trained using the DistBelief distributed machine learning system.Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc’aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, and Andrew Y. Ng. Large scale distributed deep networks. In P. Bartlett, F.c.n. Pereira, C.j.c. Burges, L. Bot- tou, and K.q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1232–1240. 2012.
We found that the photometric distortions by Androw Howard were usefu. to combat overfitting.
Andrew G. Howard. Some improvements on deep convolutional neural network based image classification. CoRR, abs/1312.5402, 2013.
7 2014 Classification
7 versions ensemble, only differ in sampleing methodologies and the random order in which they see input image.More aggressive cropping approach :4 scales 256,288,320,352=144 crops per image.
simple averaging
8 2014 Detection
Googlenet ‘s improvement through ensemble is obviously,but a single Deep Insight model is more poverful than a single googlenet.9 Conclusion
Our result seem to yield a solid evidence that apporximating the expected optimal sparse structure by readily avialable dense building blocks is a viable method for imporving neural networks for computer vison.Authors
Motivation and high level considerations
Architectural Details
Training Methodolgy
2014 Classification
2014 Detection
Conclusion
相关文章推荐
- Django项目结构布局
- LightOJ 1030 Discovering Gold
- mongo的数据备份
- Poj 2723 Go Deeper【2-SAT-----Tarjan强连通+二分】
- 特殊矩阵——三对角矩阵(Tridiagonal Matrix)
- hdu_3966_Aragorn's Story(树链剖分裸题)
- hdu_3966_Aragorn's Story(树链剖分裸题)
- 【读书笔记】Zygote 和 System 进程的启动过程
- 谷雪梅 Google中国
- Go语言学习笔记1
- 贪婪算法(Greedy Algorithm)
- 课后作业静态网页制作-仿Argo官网
- GOF-23种设计模式——装饰模式(学习笔记)
- 深入理解go的slice和到底什么时候该用slice?
- pgoneproxy使用tcmalloc来管理内存
- UVA 11292[Dragon of Loowater ]
- Windows 7 64位下解决不能创建Django项目问题
- CategorySecond (vo,xml)
- Category (xml,vo)
- mac下搭建go环境,以及使用IDE