An Investigation of Model-Free Planning

发表时间：2019（ICML 2019）
文章要点：这篇文章主要是做实验探讨了一下什么形式算planning。之前的planning通常会设置一个具体的planning算法，比如Monte Carlo rollouts,MCTS等等，或者在网络里面嵌入类似planning的结构，比如VIN,ATreeC之类的。作者想说，其实不需要去设计这些具体的planning，直接就像LSTM这些带有时序关系的网络就能展现出planning的特性了。
具体的，作者首先定义什么是planning。不像经典的planning的定义方法，planning需要有一个look-ahead机制之类的，作者认为重要的不是设计这个机制，而是planning带来的预见性（foresight）。作者认为的planning，是generalize能力强（First, an effective planning algorithm should be able to generalize with relative ease to different situations），能从少量样本中有效学习（Second, a planning agent should be able to learn efficiently from relatively small amounts of data），充分利用时间限制（Third, an effective planning algorithm should be able to make good use of additional thinking time.）。说白了，只要你效果好，你就是planning，我根本不care你到底有没有具体的planning机制。
然后作者就直接搞了ConvLSTM结构（Repeated ConvLSTM (DRC) network architecture），通过堆不同的时序结构的深度和宽度来做实验。然后就说，这也是planning，而且效果不错。
总结：标题取得很大很好，内容感觉配不上标题。
疑问：这个都能ICML，不知道是哪里被reviewer认可了。

posted @ 2022-05-25 23:32 initial_h 阅读(74) 评论(0) 收藏举报

initial_h

https://github.com/initial-h

An Investigation of Model-Free Planning

公告