您的位置:首页 > 其它

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

2017-04-14 11:09 801 查看
https://arxiv.org/abs/1704.03899

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing
it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction
model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network" and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence
of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words
towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft
COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: