您的位置:首页 > Web前端

[深度学习论文笔记][Video Classification] Learning Spatiotemporal Features with 3D Convolutional Networks

2016-11-16 11:04 811 查看
Tran, Du, et al. “Learning spatiotemporal features with 3d convolutional networks.” 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015. (Citations: 101).

1 Architecture

This model is 3D VGGNet, basically. It contains 3 × 3 × 3 conv, 2 × 2 × 2 pool. An illustration of 3d convolution can be seen in Fig. 3D convolution preserves the temporal
information of the input signals resulting in an output volume.



2 Results

By using deconv approach, we observe that C3D starts by focusing on appearance in the first few frames and tracks the salient motion in the subsequent frames. Thus 3d CNN

differs from stadard 2d CNN in that it selectively attends to both motion and appearance. Like standard 2d CNN, we can extract video features from 3d CNN. We use fc6 features

in our experiments.

3 References

[1]. http://web.cs.hacettepe.edu.tr/ ̃aykut/classes/spring2016/bil722/slides/w07-conv3d.pdf.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
相关文章推荐