您的位置：首页 > 其它

part-aligned系列论文：1707.ICCV.Pedestrian Alignment Network for Large-scale Person Re-identification 论文

2018-01-06 23:02 531 查看

Pedestrian Alignment Network for Large-scale Person Re-identification

github源码：https://github.com/layumi/Pedestrian_Alignment

本论文着重对错检和body部分缺失引起的尺度和姿态变化问题研究（如上图），前者可能会使检测得到的图像含有过多的背景，后者则是part missing，由此产生的misalignment对reid识别的结果有重要的影响，作者针对此提出了PAN网络，即可以在不用额外的标注情况下同时进行行人对齐和学习判别性的特征描述子，提出的网络可以充分利用CNN提取的特征图更多激活表现在body上这个注意机制来自适应的定位和对齐行人。

作者在Market1501, CUHK03 (detected), CUHK03 (labeled)and DukeMTMC-reID 上做了实验验证了方法的有效性。作者比较时，还考虑了re-ranking（2017），可以进一步提升性能。

缺点：该论文方法新颖，但对齐效果并不是太理想，方法具有一定的局限性，只能在一定程度上提升reid的效果。

模型框架：

模型含两个分支，两个分支都是分类网络，分基分支和alignment branch ，这两个分支不分享权重，两者之间是一个Affine Estimation分支（应用了STN网络,可微的定位网络（spatial transformer network）），用于提供给对齐分支对齐的特征图，用STN可以：

1.crop the detected images which may contain too much background

2.pad zeros to the borders of images with missing parts

采用ResNet-50作为基模型

仿射变换分支：仿射变换分支包含一个双线性采样内插(bilinear sampler)和一个网格网络(Grid Network包含一个Res-Block和一个均值降采样池化层)，作者将Res4特征图通过网格网络回归出一个六维的仿射变换参数，用该参数来产生image grid。用双线性内插来补变换后缺失的像素，并用0补变换之外在原始图像上边缘处的像素。

模型的训练和测试：In the training phase, the model minimizes two identification losses.In the test phase, we concatenate two 1 × 1 × 2048 FC embeddings to form a 4096-dim pedestrian descriptor for retrieval。

作者用实验证明了模型的两个分支得到的特征描述具有一定的互补性

作者主要参考了以下论文：

He et al., 2016. He, K., Zhang, X., Ren, S., and Sun, J.(2016). Deep residual learning for image recognition. In CVPR

Jaderberg et al., 2015. Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In NIPS

Johnson et al., 2015. Johnson, J., Karpathy, A., and Fei-Fei, L. (2015). Densecap: Fully convolutional localization networks for dense captioning. arXiv:1511.07571.（该论文采用fater-RCNN，RNN，STN来实现图像标注的定位和自然语言信息描述）

state-of-the-art re-ranking method：Zhong et al., 2017. Zhong, Z., Zheng, L., Cao, D., and Li, S. (2017). Re-ranking person re-identification with kreciprocal encoding. In CVPR.

Deeply-learned Models 相关论文：

使用块匹配策略对齐的论文：一般假设匹配的局部结构位于同一水平条纹上

Li et al., 2014. Li, W., Zhao, R., Xiao, T., and Wang, X (2014). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR

Yi et al., 2014. Yi, D., Lei, Z., Liao, S., and Li, S. Z. (2014). Deep metric learning for person re-identification. In ICPR.

Zhao et al., 2013a. Zhao, R., Ouyang, W., and Wang, X. (2013a). Person re-identification by salience matching. In ICCV.

Zhao et al., 2014. Zhao, R., Ouyang, W., and Wang, X (2014). Learning mid-level filters for person reidentification. In CVPR.

Liao et al., 2015. Liao, S., Hu, Y., Zhu, X., and Li, S. Z. (2015). Person re-identification by local maximal occurrence representation and metric learning. In CVPR.

Cheng et al., 2016. Cheng, D., Gong, Y., Zhou, S., Wang, J., and Zheng, N. (2016). Person re-identification by multichannel parts-based cnn with improved triplet loss function. In CVPR.

Ahmed et al., 2015. Ahmed, E., Jones, M., and Marks, T. K. (2015). An improved deep learning architecture for person re-identification. In CVPR.

采用其他，如注意机制：

Liu et al., 2016a. Liu, H., Feng, J., Qi, M., Jiang, J., and Yan, S. (2016a). End-to-end comparative attentionnetworks for person re-identification. arXiv:1606.04404

Varior et al., 2016a. Varior, R. R., Haloi, M., and Wang, G. (2016a). Gated siamese convolutional neural network architecture for human re-identification. In ECCV.

Varior et al., 2016b. Varior, R. R., Shuai, B., Lu, J., Xu, D., and Wang, G. (2016b). A siamese long short-term memory architecture for human re-identification. In ECCV.

其他：

PoseBox基于Pictorial Structures：Zheng et al., 2017a. Zheng, L., Huang, Y., Lu, H., and Yang, Y. (2017a). Pose invariant embedding for deep person reidentification. arXiv:1701.07732

其他论文：

LSTM：Hochreiter and Schmidhuber, 1997. Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735{1780.

STN论文：

Jaderberg et al., 2015. Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In NIPS

Johnson et al., 2015. Johnson, J., Karpathy, A., and Fei-Fei, L. (2015). Densecap: Fully convolutional localization networks for dense captioning. arXiv:1511.07571.（该论文采用fater-RCNN，RNN，STN来实现图像标注的定位和自然语言信息描述）

re-ranking相关论文：

Ye et al., 2015. Ye, M., Liang, C., Wang, Z., Leng, Q., and Chen, J. (2015). Ranking optimization for person reidentification via similarity and dissimilarity. In ACM Multimedia.

Qin et al., 2011. Qin, D., Gammeter, S., Bossard, L., Quack, T., and Van Gool, L. (2011). Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In CVPR.

Zhong et al., 2017. Zhong, Z., Zheng, L., Cao, D., and Li, S. (2017). Re-ranking person re-identification with kreciprocal encoding. In CVPR.【目前最好】

实验：

Visualize the Res4 Feature Maps in the base vbranch