您的位置：首页 > 其它

正则化技术(分类识别)：PatchShuffle Regularization 论文阅读笔记

2018-01-08 16:49 417 查看

PatchShuffle Regularization论文下载：https://arxiv.org/abs/1707.07103

论文详细信息：

过拟合问题发生的本质是模型更多的去学习噪声而不是捕捉潜在的存在于数据中的Varations关键因素，即由于缺少数据的多样性或者模型过于复杂使得模型的学习被不相关的局部信息所误导，学习了没用的噪声数据，考虑人的机制，即观察图像整体结构不变情况下，局部的适中程度模糊，有助于减少对局部噪音数据的学习，产生多样的局部变化，使模型注意这些variations，这种机制有助于模型的训练和学习，因此，在本论文工作：引入了一种新的随机化的正则化方法–PatchShuffle，来提升训练模型对噪音和遮挡的鲁棒性。且与其他的正则化方法具有一定的互补性，可以综合应用他们，取得更好的效果。该操作简单有效，可以应用于图像本身或者特征图上，减少训练网络的过拟合发生，某种程度上相当于数据增强（生成新的image或者特征图（无序随机的块内元素被打乱），）提升模型的泛化能力。

由于PatchShuffle只是被应用在全部图像或者特征图中占很小的百分比，且生成的图像或者特征图共享原始图像的全局结构（局部区域像素的行列变换而weight sharing），所以作者认为该方法归于正则化更贴切。应用在image上时，相当于数据增强；PatchShuffle当被应用于特征图上时，相当于model ensemble。这实际上, locally shuffling the pixels within a patch is equivalent to shuffling the convolutional kernels given unshuffled patches. PatchShuffle can also be considered to enable weight sharing within each patch. By shuffling, the pixel instantiation at a specific position of an image can be viewed as being sampled from its neighboring pixels within a patch with equal probability。

Improving the robustness of CNNs to data that is noisy or losses partial (如椒盐噪声和遮挡)information。PatchShuffle relates to two kinds of regularizations：model ensemble和weight sharing。

与作者相近的工作：

Xu Shen, Xinmei Tian, Shaoyan Sun, and Dacheng Tao. Patch reordering: A novelway to achieve rotation and translation invariance in convolutional neural networks. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., pages 2534–2540, 2017（该论文重排块，打乱了图像整体结构，采用启发式搜索ranking，聚焦于旋转和平移变换，这不同于作者的工作）

其他正则化方法：Regularization

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 2

L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus. Regularization of neural networks using dropconnect. In ICML, 2013

J. Ba and B. Frey. Adaptive dropout for training deep neural networks. In NIPS, 2013

M. D. Zeiler and R. Fergus. Stochastic pooling for regularization of deep convolutional neural networks. In ICLR, 2013

L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian. Disturblabel: Regularizing cnn on the loss layer. In CVPR, 2016

Saurabh Singh, Derek Hoiem, and David A. Forsyth. Swapout: Learning an ensemble of deep architectures. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 28–36, 2016.（swapout）

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam,The Netherlands, October 11-14, 2016, Proceedings, Part IV, pages 646–661, 2016.（stochastic depth）

Steven J Nowlan and Geoffrey E Hinton. Simplifying neural networks by soft weight-sharing. volume 4, pages 473–493. MIT Press, 1992（weight-sharing）

Anders Krogh and John A Hertz. A simple weight decay can improve generalization. In NIPS,1991.(weight decay )

Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. 2012.(DropOut)

Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. volume 15, pages 1929–1958, 2014.(DropOut)

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.(BatchNormalization)

涉及数据增强的文献：Data augmentation（flipping,translation, cropping, etc）

Zhe Gan, Ricardo Henao, David Carlson, and Lawrence Carin. Learning deep sigmoid belief networks with data augmentation. In Artificial Intelligence and Statistics, pages 268–276, 2015.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.

Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. In ICLR, 2014

等换网络：Transformation equivariant and invariant networks.

Robert Gens and Pedro M Domingos. Deep symmetry networks. In NIPS, 2014

Sander Dieleman, Jeffrey De Fauw, and Koray Kavukcuoglu. Exploiting cyclic symmetry in convolutional neural networks. 2016

Siamak Ravanbakhsh, Jeff Schneider, and Barnabas Poczos. Deep learning with sets and point clouds. 2016

论文介绍：块内元素的随机打乱排列——PatchShuffle

数据增强和正则化的区别在于，前者偏向于扩大数据集大小及多样性，而正则化在于不增大数据集容量而是专注于数据变换。

随机性处理被证明是有用的，对于通过模型平均训练CNN时的正则化

作者做块的像素随机排列应用了矩阵的前乘和后乘的行列变换的几何意义，建模如下：

图像和特征图都是矩阵，所以两者都可以做块内元素的随机打乱排列

PatchShuffle on images在图像上做块的打乱操作

PatchShuffle on feature maps

因为越低的层，空间的结构保留的越多，所以尽可能的让PatchShuffle应用在随机挑选的较低层的特征图上，而对于更高层的特征图，PatchShuffle

使临近像素可以权重参数共享，这使映射到原始图像上有更大重叠感受野的临近像素受益。

注：PatchShuffle creates new images and feature maps, which increases the variety of the training data。但也引入CNN更多的bias，因此应用PatchShuffle 的图像或者特征图要占小的百分比！

该正则化方法反应在数学公式的推到为：

其中伯努利分布即：

实际中，这个shuffle probability

不应太大，即原图被打乱操作的概率应该是小概率。

训练过程处理的流程：以特征图的操作为例：

实验：

四个图像分类数据集：CIFAR-10, SVHN, STL-10 and MNIST.

patch size Hp × Wp and shuffle probability 的超参数影响和选择：

In fact, within an extent, the increase of both parameters improve the variety of training sample without introducing too much bias. But under larger values, the benefit brought by diversity is gradually overtaken by the classifier bias, so error rate increases.

由图知，下面实验采用

Classification Performance w/wo PatchShuffle

在CIFAR-10数据集上

在SVHN数据集上

在STL-10数据集上

The five-bit binary code denotes on what stages PatchShuffle is applied. The first bit denotes the input layer, and the other four bits correspond to four residual stages

在MNIST数据集上 & Robustness to the Noise

Salt-and-pepper noise is added to the image by changing the pixel to white or black with probability τ1. For the occlusion,each pixel is randomly chosen to be imposed by a black block of certain size centered on it with probability τ2. The size of the block adopted in our experiment is 3 × 3.

These results indicate that Patchshuffle improves the robustness of CNNs against common image pollutions like noise and occlusion.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航