Deep Reinforcement Learning-based Image Captioning with Embedding Reward
2017-10-09 11:58
686 查看
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Zhou Ren, Xiaoyu Wang, NingZhang, Xutao Lv, Li-Jia Li
(Submitted on 12 Apr 2017)
Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most
state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network"
and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead
guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic
reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.
Subjects: | Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) |
Cite as: | arXiv:1704.03899 [cs.CV] |
(or arXiv:1704.03899v1 [cs.CV] for this version) |
Submission history
From: Xiaoyu Wang [view email][v1] Wed, 12 Apr 2017 18:55:03 GMT (5922kb,D)
相关文章推荐
- Deep Reinforcement Learning-based Image Captioning with Embedding Reward
- Paper-[acmi 2015]Image based Static Facial Expression Recognition with Multiple Deep Network Learning
- 论文笔记:Learning Social Image Embedding with Deep Multimodal Attention Networks
- Deep Learning for Content-Based Image Retrival:A Comprehensive Study 论文笔记
- Incentivizing exploration in reinforcement learning with deep predictive models
- Deep Learning for Content-Based Image Retrival:A Comprehensive Study 学习笔记
- Feature Learning based Deep Supervised Hashing with Pairwise Labels
- Paper Reading - Playing Atari with Deep Reinforcement Learning
- Feature Learning Based Deep Supervised Hashing with Pairwise Labels
- Playing Atari with Deep Reinforcement Learning
- NOTE:Deep Reinforcement Learning with a Natural Language Action Space
- PR10.10:#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
- End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning
- (Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al., arXiv, 2015)(dqn)练习
- Playing FPS Games with Deep Reinforcement Learning
- 论文笔记之:Playing Atari with Deep Reinforcement Learning
- Playing Atari with Deep Reinforcement Learning算法解读
- Paper Reading 1 - Playing Atari with Deep Reinforcement Learning
- Continuous control with deep reinforcement learning
- 深度学习用于基于内容的图像检索 Deep Learning for Content-Based Image Retrieval