Dec.22-Dec.28
Reading list
- mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
- Unified Vision-Language-Action Model
- Large Video Planner Enables Generalizable Robot Control
- Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation
- Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning
- Efficientnet: Rethinking model scaling for convolutional neural networks.
- Film: Visual reasoning with a general conditioning layer.
Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning
- TLNR:训练BC策略时不是对所有数据一视同仁,而是适当增强训练集中占比较小的数据集的比重。
RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches
- ICLR2024 reject
- project
- TLNR: RT1输入变种,对比了language-conditioned \ Target_Image_conditioned \ Target_Sketch_conditioned Robotic Transformer模型,发现经GAN进行目标图形风格迁移后(变成草图轮廓)作为目标输入,经预训练(使用了草图轮廓增强)Transformer的预测,效果更优。
- Note:单纯文本作为目标需要基础模型具有较强的语义理解能力,显然RT系列并没有进行大规模文本预训练;而RT-Image将图像作为操作目标,画面冗余信息过多,sketch可以作为图像的简化目标,降低视觉负载。
- cons: 训练一个通用风格迁移网络本身较为困难,且需要手绘目标图作为目标,具有较强局限性。
Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation
posted @
2025-12-22 10:32
霜尘FrostDust
阅读(
6)
评论()
收藏
举报