【CS229 lecture20】策略搜索
2016-03-12 21:54
471 查看
lecture20
强化学习最后一课
Agenda
-POMDPs (partially observable MDPs)
-Policy search (the main topic for today will be policy search algorithm, specificlly I’ll talk about two algorithms named Reinforced and Pegasus)
-Reinforced
-Pegasus
-Conclusion
Recap last lecture, I actually started talk about one specific example of a POMDP which was this sort of linear dynamical system(St+1=A*St+1+B*at+wt). This is sort of LQR, linear quadratic revelation problem, but I change it and said what if we only have observations yt…
POMDP的形式化定义(in general, PODMP problem is NP-hard)和policy search(I think it is the most effective classes of reinforcement learning algorithm as well both for MDPs and for POMDPs,今天先讲将policy search algorithm应用到MDP中,也就是有完全的observations中,然后再讲怎么应用到POMDP中,但是将其应用到POMDP中时,难以保证你得到的是一个全局最优policy,因为一般来讲POMDP是NP-hard的,但我认为policy search algorithm对MDP和POMDP都是最effective的):
so our first policy search algorithm——Reinforced algorithm
give one specific example to present our algorithm(倒立摆)
(下图横线以上是回答同学“当有多个actions时。。。”)
具体的求解过程
证明:
value approximation approach to find the policy 以及刚才讲到的policy search algorithm 哪个更好?
本能,条件反射式的低级别决策,比如倒立摆等很有可能存在一个logistic函数从状态映射到policy 用后者
高级别决策,比如围棋,要前后考虑,使用前者。
后者还可以应用于POMDP,尽管是partially observed states, estimated states也无妨
It turns out Reinforced algorithm is effective, but it’s noisy??
另一个策略搜索算法: Pegasus(我们在自主直升机飞行上使用多年了)
Pegasus 是 policy evaluation of gradient and search using scenarios的缩写。
这就是Pegasus policy search algorithm. 我们用在了直升机中,而且对于大规模问题也有很好的效果。
In closing, let me just say this class has been really fun…
Thank you !
至此,课程到此结束……
强化学习最后一课
Agenda
-POMDPs (partially observable MDPs)
-Policy search (the main topic for today will be policy search algorithm, specificlly I’ll talk about two algorithms named Reinforced and Pegasus)
-Reinforced
-Pegasus
-Conclusion
Recap last lecture, I actually started talk about one specific example of a POMDP which was this sort of linear dynamical system(St+1=A*St+1+B*at+wt). This is sort of LQR, linear quadratic revelation problem, but I change it and said what if we only have observations yt…
POMDP的形式化定义(in general, PODMP problem is NP-hard)和policy search(I think it is the most effective classes of reinforcement learning algorithm as well both for MDPs and for POMDPs,今天先讲将policy search algorithm应用到MDP中,也就是有完全的observations中,然后再讲怎么应用到POMDP中,但是将其应用到POMDP中时,难以保证你得到的是一个全局最优policy,因为一般来讲POMDP是NP-hard的,但我认为policy search algorithm对MDP和POMDP都是最effective的):
so our first policy search algorithm——Reinforced algorithm
give one specific example to present our algorithm(倒立摆)
(下图横线以上是回答同学“当有多个actions时。。。”)
具体的求解过程
证明:
value approximation approach to find the policy 以及刚才讲到的policy search algorithm 哪个更好?
本能,条件反射式的低级别决策,比如倒立摆等很有可能存在一个logistic函数从状态映射到policy 用后者
高级别决策,比如围棋,要前后考虑,使用前者。
后者还可以应用于POMDP,尽管是partially observed states, estimated states也无妨
It turns out Reinforced algorithm is effective, but it’s noisy??
另一个策略搜索算法: Pegasus(我们在自主直升机飞行上使用多年了)
Pegasus 是 policy evaluation of gradient and search using scenarios的缩写。
这就是Pegasus policy search algorithm. 我们用在了直升机中,而且对于大规模问题也有很好的效果。
In closing, let me just say this class has been really fun…
Thank you !
至此,课程到此结束……
相关文章推荐
- leetcode--Product of Array Except Self
- 2015Android 开发新技术
- 大小写转化
- python中的open的使用
- mycat 集群实现过程
- iOS 多线程 NSthread的简单使用——iOS 编码复习(六)(多线程4)
- HDU 5640 King's Cake
- Socket编程
- a标签的四个css伪类(link、visited、hover、active)样式理解
- HDU 1253 胜利大逃亡(空间BFS)
- Python的urlopen的使用
- POJ1155 TELE(树形DP)
- JSP内置对象详解
- 求多个数的最小公倍数(以3个数为例)
- 近期编程总结(i think -1)
- Ruby边学边记
- HDU 5640 King's Cake [模拟]
- JavaScript的面向对象
- JSP指令元素
- apache 环境的搭建