Video captioning with recurrent networks based on frame- and video-level features and visual content
2017-06-11 10:16
405 查看
Video captioning with recurrent networks based on frame- and video-level features and visual content classification
Rakshith Shetty, Jorma Laaksonen(Submitted on 9 Dec 2015)
In this paper, we describe the system for generating textual descriptions of short video clips using recurrent neural networks (RNN), which we used while participating in the Large Scale Movie Description Challenge 2015 in ICCV 2015. Our work builds on static
image captioning systems with RNN based language models and extends this framework to videos utilizing both static image features and video-specific features. In addition, we study the usefulness of visual content classifiers as a source of additional information
for caption generation. With experimental results we show that utilizing keyframe based features, dense trajectory video features and content classifier outputs together gives better performance than any one of them individually.
Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
Cite as: | arXiv:1512.02949 [cs.CV] |
(or arXiv:1512.02949v1 [cs.CV] for this version) |
Submission history
From: Rakshith Shetty [view email][v1] Wed, 9 Dec 2015 17:17:29 GMT (86kb,D)
相关文章推荐
- Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation
- Safe! Repel Attacks on Your Code with the Visual Studio 2005 Safe C and C++ Libraries
- Building Applications with Force.com and VisualForce (DEV401) (二) : Application Essentials:Designing Application on the Force.com Platform
- 目标跟踪之“Robust Visual Tracking with Deep Convolutional Neural Network based Object Proposals on PETS”
- Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
- Large-scale Multimodal Gesture Segmentation and Recognition based on Convolutional Neural Networks
- How to hide and show elements based on a Content Control value change?
- Human-Robot Interaction based on Haar-like Features and Eigenfaces 学习心得及相关的较好的网站
- 【论文阅读笔记】CVPR2015-Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation 论文笔记
- A Hybrid User and Item-Based Collaborative Filtering with Smoothing on Sparse Data
- How to read out WhatsApp messages with Tasker and react on their content in real time
- Deep Learning 论文解读——Session-based Recommendations with Recurrent Neural Networks
- Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
- A Hybrid User and Item-Based Collaborative Filtering with Smoothing on Sparse Data
- Notes on Large-scale Video Classification with Convolutional Neural Networks
- with ffmpeg to encode video for live streaming and for recording to files for on-demand playback
- long term recurrent convolutional networks for visual recognition and description
- A Tutorial on Deep Learning Part 2: Autoencoders, Convolutional Neural Networks and Recurrent Neural
- 行为识别阅读笔记(paper + parted code):Beyond Frame-level CNN Saliency-Aware 3-D CNN with LSTM for Video Acti