PHD.ZJU - 随笔分类 - 霜尘FrostDust

论文速读 | 25年9月

摘要：What can rl bring to vla generalization? an empirical study. arxiv 在vla模型的最后一层外接MLP来得到Q-value，从而可以使用PPO等强化学习算法进行微调 PPO表现优于DPO、GRPO等 RL微调vla使其泛化性提高 Sho 阅读全文

posted @ 2025-09-03 21:52 霜尘FrostDust 阅读(47) 评论(0) 推荐(0)

offline RL | In-Context Reinforcement Learning Papers Collection

摘要：有关上下文强化学习的优质论文收集： Awesome In-Context Reinforcement Learning In-context Reinforcement Learning with Algorithm Distillation Michael Laskin, Luyu Wang, J 阅读全文

posted @ 2025-03-05 19:38 霜尘FrostDust 阅读(188) 评论(0) 推荐(0)

offline RL | 复现Decision Transformer

摘要：本文主要是对论文Decision Transformer: Reinforcement Learning via Sequence Modeling的复现记录由于论文年代比较早(21年)，主要的复现工作也是在22年之前，随着环境和包依赖的改变，实现起来比较困难。笔者作为RL小白也是在配置环境上面吃阅读全文

posted @ 2025-02-25 23:08 霜尘FrostDust 阅读(502) 评论(0) 推荐(0)

【论文阅读】GROOT：Learning to Follow Instructions by Watching Gameplay Viedos

摘要：GROOT：Learning to Follow Instructions by Watching Gameplay Viedos.作者为北京大学梁一韬所在的Team CraftJarvis,发表时间为2023 Background 在开放世界下开发类人级别的具身智能体以解决开放式任务一直是人工智能阅读全文

posted @ 2025-01-17 11:15 霜尘FrostDust 阅读(162) 评论(0) 推荐(0)

FrostDust

随笔分类 - PHD.ZJU

公告