您的位置：首页 > 其它

Incentivizing exploration in reinforcement learning with deep predictive models

2017-08-13 19:00 537 查看

Stadie, Bradly C., Sergey Levine, and Pieter Abbeel. "Incentivizing exploration in reinforcement learning with deep predictive models." arXiv preprint arXiv:1507.00814 (2015).

作者通过模拟(状态，动作)的不确定性，从而修改reward，帮助agent进行探索。作者说用了他们的方法不用进行随机探索。该方法比较通用，适用于多种RL模型，但是要训练auto-encoder，所以也稍微有点繁琐。

实用指数：3颗星

理论指数：1颗星

创新指数：4颗星

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

using learning rate schedules for deep learning models in python with keras
下一代机器学习-在浏览器中训练深度学习模型Next Generation Machine Learning - Training Deep Learning Models in a Browser
论文笔记之：Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning
Continuous control with Deep Reinforcement Learning与DDPG（Deep Deterministic Policy Gradient）的理解
18 Issues in Current Deep Reinforcement Learning from ZhiHu
(转) Playing FPS games with deep reinforcement learning
Playing Atari with Deep Reinforcement Learning算法解读
解读continuous control with deep reinforcement learning（DDPG）
Playing Atari with Deep Reinforcement Learning
PR10.10:#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
论文笔记之：Active Object Localization with Deep Reinforcement Learning
DeepMind one shot learning 论文批注 One-Shot Generalization in Deep Generative Models
Playing FPS Games with Deep Reinforcement Learning
NOTE:Deep Reinforcement Learning with a Natural Language Action Space
Deep Q-Network，NIPS-2013：Playing Atari with Deep Reinforcement Learning
learning to communicate with deep multi-agent reinforcement learning
Image Completion with Deep Learning in TensorFlow【DCGAN，图像补全】
Paper Reading 3:Continuous control with Deep Reinforcement Learning
论文笔记之：Playing Atari with Deep Reinforcement Learning
Continuous control with deep reinforcement learning(DDPG，深度确定策略梯度)练习

新的分享

一次教科书级别的Redis高可用架构设计实践 - Redis
曾光：北京这次的毒株不像国内流行类型
从PRD文档到产品上线，有哪些问题需要解决？
vue3自定义指令的使用
Oracle SQL性能优化最常用的40条建议 - ORACLE
程序员翻车常见反应，你中枪了吗？ - 职场生涯
新鲜开源：基于Prometheus的企业监控平台设计与实现 - 运维
嵌入式软件开发之程序架构设计-任务调度
【Java面试】请简单说一下你对受检异常和非受检异常的理解
奇安信更新招股书：第一季亏损过5亿，齐向东持股38%
艾瑞咨询：2020年中国后智能厨房案例研究报告
艾瑞咨询：2020年中国人工智能+物流发展研究报告

章节导航