offline RL | In-Context Reinforcement Learning Papers Collection
有关上下文强化学习的优质论文收集:
In-context Reinforcement Learning with Algorithm Distillation
- Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih
Supervised Pretraining Can Learn In-Context Reinforcement Learning
- Jonathan N. Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill
Vintix: Action Model via In-Context Reinforcement Learning
- Andrey Polubarov, Nikita Lyubaykin, Alexander Derevyagin, Ilya Zisman, Denis Tarasov, Alexander Nikulin, Vladislav Kurenkov
Emergence of In-Context Reinforcement Learning from Noise Distillation
- Ilya Zisman, Vladislav Kurenkov, Alexander Nikulin, Viacheslav Sinii, Sergey Kolesnikov
In-Context Reinforcement Learning for Variable Action Spaces
- Viacheslav Sinii, Alexander Nikulin, Vladislav Kurenkov, Ilya Zisman, Sergey Kolesnikov
In-Context Reinforcement Learning Without Optimal Action Labels
- Juncheng Dong, Moyang Guo, Ethan X Fang, Zhuoran Yang, Vahid Tarokh
Yes, Q-learning Helps Offline In-Context RL
- Denis Tarasov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Andrei Polubarov, Nikita Lyubaykin, Alexander Derevyagin, Igor Kiselev, Vladislav Kurenkov
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
- Alexander Nikulin, Ilya Zisman, Alexey Zemtsov, Viacheslav Sinii, Vladislav Kurenkov, Sergey Kolesnikov
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI
- Ahmad Elawady, Gunjan Chhablani, Ram Ramrakhya, Karmesh Yadav, Dhruv Batra, Zsolt Kira, Andrew Szot
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
- Can Demircan, Tankred Saanum, Akshay K. Jagadish, Marcel Binz, Eric Schulz
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
- Jake Grigsby, Linxi Fan, Yuke Zhu
AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers
- Jake Grigsby, Justin Sasek, Samyak Parajuli, Daniel Adebi, Amy Zhang, Yuke Zhu
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
- Licong Lin, Yu Bai, Song Mei
Structured State Space Models for In-Context Reinforcement Learning
- Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani
LLMs Are In-Context Reinforcement Learners
- Giovanni Monea, Antoine Bosselut, Kianté Brantley, Yoav Artzi
EVOLvE: Evaluating and Optimizing LLMs For Exploration
- Allen Nie, Yi Su, Bo Chang, Jonathan N. Lee, Ed H. Chi, Quoc V. Le, Minmin Chen
In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
- Sili Huang, Jifeng Hu, Hechang Chen, Lichao Sun, Bo Yang
Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling
- Sili Huang, Jifeng Hu, Zhejian Yang, Liwei Yang, Tao Luo, Hechang Chen, Lichao Sun, Bo Yang
Retrieval-Augmented Decision Transformer: External Memory for In-context RL
- Thomas Schmied, Fabian Paischer, Vihang Patil, Markus Hofmarcher, Razvan Pascanu, Sepp Hochreiter
Human-Timescale Adaptation in an Open-Ended Task Space
- Adaptive Agent Team
Generalization to New Sequential Decision Making Tasks with In-Context Learning
- Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu
Large Language Models can Implement Policy Iteration
- Ethan Brooks, Logan Walls, Richard L. Lewis, Satinder Singh
Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration
- Chentian Jiang, Nan Rosemary Ke, Hado van Hasselt
Towards General-Purpose In-Context Learning Agents
- Louis Kirsch, James Harrison, C. Daniel Freeman, Jascha Sohl-Dickstein, Jürgen Schmidhuber
Large Language Models as General Pattern Machines
- Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng
Cross-Episodic Curriculum for Transformer Agents
- Lucy Xiaoyang Shi, Yunfan Jiang, Jake Grigsby, Linxi "Jim" Fan, Yuke Zhu
First-Explore, then Exploit: Meta-Learning Intelligent Exploration
- Ben Norman, Jeff Clune
Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning
- Tengye Xu, Zihao Li, Qinyuan Ren
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning
- Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster
Large Language Models As Evolution Strategies
- Robert Tjarko Lange, Yingtao Tian, Yujin Tang
Can Large Language Models Explore In-Context?
- Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins
SAD: State-Action Distillation for In-Context Reinforcement Learning under Random Policies
- Weiqin Chen, Santiago Paternain
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
- Subhojyoti Mukherjee, Josiah P. Hanna, Qiaomin Xie, Robert Nowak
Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning(ICLR2025)
- Jiuqi Wang, Ethan Blaser, Hadi Daneshmand, Shangtong Zhang
- 偏向数学公式推导,我先去看看综述
OmniRL: In-Context Reinforcement Learning by Large-Scale Meta-Training in Randomized Worlds
- Wang, Fan, Pengtao Shao, Yiming Zhang, Bo Yu, Shaoshan Liu, Ning Ding, Yang Cao, Yu Kang, and Haifeng Wang
A survey of in-context reinforcement learning
- Moeini, A., Wang, J., Beck, J., Blaser, E.
- 主要内容:将ICRL分为两类——Surpervised Pretraining和Reinforcement Pretraining
- Surpervised Pretraining:预训练网络通过BC,预训练目标是对数似然概率\(log\pi_{\theta}(a^{*}|s,c)\)或变式。下面介绍构建输入和输出的方法
- (1)Input:来自多个episodes的轨迹拼成(cross-episode input),output:corresponding action(AD)
- fill the context with a curriculum of trajectories——order the trajectories by task difficulty, demonstrator proficiency, or episode returns
- 向专家演示添加噪声
- 在 trajectory 中使用显式特征来指示当前episodec是否比之前的episoodic更好(cross-episodic return-to-go)
- 以上提升input-output pairs的方法加速Model在前向过程中实行some policy improvement algorithm。
- (2)另一类方法选择强调与当前任务类似的离线数据集中的轨迹,并把他们预置到上下文中,这种方法鼓励模型实行some imitation learni9ng algorithm
- (3)有一些工作将hindsight信息融入到上下文中来促进策略优化,比如Return
- (1)sample efficiency是另一个研究重点,包括使用n-gram induction heads、Importance Sampling、使用次优策略并添加退火噪声生成学习历史、减少环境交互降低实际机器人序列长度。
- (1)encoding context也是研究内容,如(st, at, rt+1, st+1)和(at−1, rt, st),另外为了实现对不同动作空间的测试时适应,Sinii et al. [2023] 将离散动作投影到随机向量嵌入中,并训练网络直接输出嵌入向量,直接选取与网络输出最接近的action。

浙公网安备 33010602011771号