XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
- key word: ICRL
- ICLR2025
- dunnolab work
- 必要性:ICRL需要数量足够大且具有一定复杂性的训练数据,即包含策略提升的历史(AD)或者最优策略轨迹(DPT),举例:在9x9的Dark key-to-door,agent需要经理2000个不同目标的组合才能开始适应没看到过的情况。
- 现有数据集:(1)offline RL Datasets:提供的任务类型太少。(2)Large scale Supervised pretraining: 尽管包含大量MDP,但缺少学习历史。
- 本文提出了Xland-100B (326GB)和其缩小版XLand-Trivial-20B(60GB)
- 局限::场景固定,观察空间相同
- key word: offline RL;Meta RL
- NIPS2024
- dunnolab work,这篇工作是上一篇的基础,即上一篇数据集采集的环境
- 提出xland-minigrid环境,可用于测试Meta-rl和In-context Rl
Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents
- key word: Generalizability;MTRL;MineCraft
- 北大梁一韬团队新作
- arxiv
Compute-Optimal Scaling for Value-Based Deep RL
- key word: RL Theory;Scaling law; online RL
- UCB, Sergy Levine and Pieter Abbeel
- arxiv
Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations
- key word: IL;Robotic;Representation
- 哈佛 Na Li团队新作
- arxiv
- 背景:普通的模仿学习存在“累积误差”的致命缺陷,学习策略会逐渐偏离行为策略分布,BC仅考虑最小化单步动作误差,导致agent偏差变大,遇见不曾见过的state;因此有人提出在马尔可夫假设下,将学习目标改为最小化agent全局学习策略与行为策略的分布差异;扩展到离线领域,出现了重要性加权的DICE(DIstribution Correction Estimiation),以克服agent受限于离线固定数据集而探索低效的缺点,优化目标为加权行为克隆。
- Motivation paper:
- Making linear mdps practical via contrastive representation learning (ICML2022) arxiv
- Spectral decomposition representation for reinforcement learning (ICLR2023)arxiv
- Motivation: 1),离线场景下,缺少足够的专家数据expert data会导致过拟合,即使有人提出可以混合次优数据,但对混合数据的质量要求也较高;2)DICE的Min-Max优化存在计算开销大和不稳定,优化困难的问题。
Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations
- key word: Generalizability;IL;Robotic
- DATA61 of CSIRO, Australia
- arxiv
posted @
2025-08-22 21:15
霜尘FrostDust
阅读(
20)
评论()
收藏
举报