Deep Reinforcement Learning-based Image Captioning with Embedding Reward
2017-04-14 11:09
801 查看
https://arxiv.org/abs/1704.03899
Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing
it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction
model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network" and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence
of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words
towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft
COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.
Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing
it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction
model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network" and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence
of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words
towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft
COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.
相关文章推荐
- Deep Reinforcement Learning-based Image Captioning with Embedding Reward
- Paper-[acmi 2015]Image based Static Facial Expression Recognition with Multiple Deep Network Learning
- 论文笔记:Learning Social Image Embedding with Deep Multimodal Attention Networks
- Paper Reading - Playing Atari with Deep Reinforcement Learning
- PR10.10:#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
- Playing FPS Games with Deep Reinforcement Learning
- (Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al., arXiv, 2015)(dqn)练习
- Image classification with deep learning常用模型
- Paper Reading 1 - Playing Atari with Deep Reinforcement Learning
- 论文笔记之:Deep Reinforcement Learning with Double Q-learning
- Continuous control with deep reinforcement learning
- [转]Deep Reinforcement Learning Based Trading Application at JP Morgan Chase
- learning to communicate with deep multi-agent reinforcement learning
- 论文笔记之:Playing Atari with Deep Reinforcement Learning
- Continuous control with deep reinforcement learning(DDPG,深度确定策略梯度)练习
- 解读continuous control with deep reinforcement learning(DDPG)
- Playing Atari with Deep Reinforcement Learning算法解读
- 深度学习用于基于内容的图像检索 Deep Learning for Content-Based Image Retrieval
- Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-ide
- Playing Atari with Deep Reinforcement Learning