【强化学习】DQN(Deep reinforcement learning) Basic
2017-08-18 10:49
711 查看
DQN(Deep reinforcement learning) Basic
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/08/e500ce510ccc654ef1cf6bfaa0d4f657)
【input】
84*84*4 image pixels. The input to the neural network consists of an 84*84*4 image produced by the preprocessing map .
【hidden layer】
The first hidden layer convolves 32 filters of 8*8 with stride 4 with the input image and applies a rectifier nonlinearity31,32. The second hidden layer convolves 64 filters of 4*4 with stride 2, again followed by a rectifier
nonlinearity. This is followed by a third convolutional layer that convolves 64 filters of 3*3 withstride 1 followed by a rectifier. The final hidden layer is fully-connected and consists of 512 rectifier units.
【output】
The output layer is a fully-connected linear layer with a single output for each valid action. The number of valid actions varied between 4 and 18 on the games we considered.
【loss function】
The loss function(object function) of DQN is
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/08/ac788d9afaa4d7974bf700dc6509cbb0)
in which gamma is the discount factor determining the agent’s horizon, theta are the parameters of the Q-network at iteration i and theta - are the network parameters used to compute the target at iteration i. The target network
parameters theta - are only updated with the Q-network parameters theta every C steps and are held fixed between individual updates.
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/08/7ff83888f982fae03b662ddaf96d4532)
3.2 为什么使用DNN
解决输入的高维问题
1 DQN’s architecture
【input】
84*84*4 image pixels. The input to the neural network consists of an 84*84*4 image produced by the preprocessing map .
【hidden layer】
The first hidden layer convolves 32 filters of 8*8 with stride 4 with the input image and applies a rectifier nonlinearity31,32. The second hidden layer convolves 64 filters of 4*4 with stride 2, again followed by a rectifier
nonlinearity. This is followed by a third convolutional layer that convolves 64 filters of 3*3 withstride 1 followed by a rectifier. The final hidden layer is fully-connected and consists of 512 rectifier units.
【output】
The output layer is a fully-connected linear layer with a single output for each valid action. The number of valid actions varied between 4 and 18 on the games we considered.
【loss function】
The loss function(object function) of DQN is
in which gamma is the discount factor determining the agent’s horizon, theta are the parameters of the Q-network at iteration i and theta - are the network parameters used to compute the target at iteration i. The target network
parameters theta - are only updated with the Q-network parameters theta every C steps and are held fixed between individual updates.
2 Algorithm
3 Conclusion
DQN use DNN to store policy, pi, which is a sequence of mapping state to action.3.2 为什么使用DNN
解决输入的高维问题
相关文章推荐
- 【DQN】解析 DeepMind 深度强化学习 (Deep Reinforcement Learning) 技术
- 深度强化学习(Deep Reinforcement Learning)入门:RL base & DQN-DDPG-A3C introduction
- 从强化学习Reinforcement Learning到DQN(Deep Q-learning Network)学习笔记
- 强化学习(九)Deep Q-Learning进阶之Nature DQN
- 【DQN】深度增强学习Deep Reinforcement Learning
- 深度强化学习(Deep Reinforcement Learning)的资源
- 深度强化学习(Deep Reinforcement Learning)的资源
- 深度强化学习(Deep Reinforcement Learning)的资源
- 转载:解析 DeepMind 深度强化学习 (Deep Reinforcement Learning) 技术
- 深度强化学习:入门(Deep Reinforcement Learning: Scratching the surface)
- 深度强化学习(Deep Reinforcement Learning)的资源汇总
- 深度增强学习Deep Reinforcement Learning (DQN方面)
- 深度增强学习Deep Reinforcement Learning (DQN方面)
- Deep Reinforcement Learning for Dialogue Generation-关于生成对话的深度强化学习
- 深度强化学习 Deep Reinforcement Learning 学习整理
- 深度强化学习(Deep Reinforcement Learning)的资源
- 强化学习第二版(翻译)第一章 导论 第三节 强化学习的要素 1.3 Elements of Reinforcement Learning
- 强化学习一:Introduction Of Reinforcement Learning
- Deep Reinforcement Learning 基础知识(DQN方面)
- Deep Reinforcement Learning 深度增强学习资源