人群密度估计--CrowdNet: A Deep Convolutional Network for Dense Crowd Counting
2017-09-28 14:26
537 查看
CrowdNet: A Deep Convolutional Network for Dense Crowd Counting
published in the proceedings of ACM Conference on Multimedia (ACMMM) - 2016
http://val.serc.iisc.ernet.in/CrowdNet/
Caffe: https://github.com/davideverona/deep-crowd-counting_crowdnet
针对人群密度估计问题,本文使用 deep and shallow, fully convolutional networks 两个网络相结合实现 large scale variations,
high-level semantic information (face/body detectors) and the low-level features (blob detectors)
我们的网络结构如下所示:
Deep Network 主要用捕获 high-level semantics 信息,这里我们采用一个类似 VGG网络的结构,我们去掉了全连接层,网络变成了全卷积层。同时原来的 VGG网络使用了5个 max-pool layers each with a stride of 2,最终的特征图大小只有输入图像尺寸的1/32。我们这里需要输出像素级别的人群密度估计图,所以我们 set the stride of the
fourth max-pool layer to 1 and remove the fifth pooling layer,这样最终的特征图大小只有输入图像尺寸的 1/8.
the receptive-field mismatch caused by the removal of stride in the fourth max-pool layer
将第四最大池化层的步长设置为1会导致 the receptive-field mismatch, 这里我们使用了文献【4】中的 膨胀卷积。其结果就相当第四最大池化层的步长设置为2
Shallow Network
这里我们使用一个 shallow convolutional network 主要用于检测远离相机的人头, used for the detection of small head-blobs
Combination of Deep and Shallow Networks
这里 concatenate Deep and Shallow Networks 的输出,输入图像尺寸的 1/8, 使用一个 1x1 convolution layer, 再 upsampled to the size of the input image using bilinear interpolation to obtain the final crowd density prediction
3.2 Ground Truth
generate our ground truth by simply blurring each head annotation using a Gaussian kernel normalized to sum to one
3.3 Data Augmentation
这里主要使用两类数据增强
primarily perform two types of augmentation
1)对 scale variations 我们多尺度采样
2)对容易错误的样本我们多训练几次
sampling high density patches more often
4 EXPERIMENTS
published in the proceedings of ACM Conference on Multimedia (ACMMM) - 2016
http://val.serc.iisc.ernet.in/CrowdNet/
Caffe: https://github.com/davideverona/deep-crowd-counting_crowdnet
针对人群密度估计问题,本文使用 deep and shallow, fully convolutional networks 两个网络相结合实现 large scale variations,
high-level semantic information (face/body detectors) and the low-level features (blob detectors)
我们的网络结构如下所示:
Deep Network 主要用捕获 high-level semantics 信息,这里我们采用一个类似 VGG网络的结构,我们去掉了全连接层,网络变成了全卷积层。同时原来的 VGG网络使用了5个 max-pool layers each with a stride of 2,最终的特征图大小只有输入图像尺寸的1/32。我们这里需要输出像素级别的人群密度估计图,所以我们 set the stride of the
fourth max-pool layer to 1 and remove the fifth pooling layer,这样最终的特征图大小只有输入图像尺寸的 1/8.
the receptive-field mismatch caused by the removal of stride in the fourth max-pool layer
将第四最大池化层的步长设置为1会导致 the receptive-field mismatch, 这里我们使用了文献【4】中的 膨胀卷积。其结果就相当第四最大池化层的步长设置为2
Shallow Network
这里我们使用一个 shallow convolutional network 主要用于检测远离相机的人头, used for the detection of small head-blobs
Combination of Deep and Shallow Networks
这里 concatenate Deep and Shallow Networks 的输出,输入图像尺寸的 1/8, 使用一个 1x1 convolution layer, 再 upsampled to the size of the input image using bilinear interpolation to obtain the final crowd density prediction
3.2 Ground Truth
generate our ground truth by simply blurring each head annotation using a Gaussian kernel normalized to sum to one
3.3 Data Augmentation
这里主要使用两类数据增强
primarily perform two types of augmentation
1)对 scale variations 我们多尺度采样
2)对容易错误的样本我们多训练几次
sampling high density patches more often
4 EXPERIMENTS
相关文章推荐
- 人群密度估计--Learning a perspective-embedded deconvolution network for crowd counting
- 快速人群密度估计--Multi-scale Convolutional Neural Networks for Crowd Counting
- 人群密度估计--Crowd Counting Via Scale-adaptive Convolutional Nerual Network
- 人群计数--Switching Convolutional Neural Network for Crowd Counting
- 人群密度估计--Fully Convolutional Crowd Counting On Highly Congested Scenes
- 人群密度估计--Spatiotemporal Modeling for Crowd Counting in Videos
- 人群密度估计--CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd
- DeepID-Net:multi-stage and deformable deep convolutional neural network for object detection
- 人群计数--Single-Image Crowd Counting via Multi-Column Convolutional Neural Network
- 人群场景分析--Slicing Convolutional Neural Network for Crowd Video Understanding
- 人群计数:Single-Image Crowd Counting via Multi-Column Convolutional Neural Network(CVPR2016)
- 人群密度估计之CrowdNet
- 人群计数--Cross-scene Crowd Counting via Deep Convolutional Neural Networks
- RCNN(二)SPP-NET:Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- 深度学习研究理解4:ImageNet Classification with Deep Convolutional Neural Network
- 风格迁移学习笔记(1):Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast
- 语义分割-- SegNet:A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
- Deep Convolutional Neural Network for Image Deconvolution
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition(SPP-Net)解读
- Deep Convolutional Network Cascade for Facial Point Detection阅读笔记