整体理解pai0-具身智能-01

总框架

π0——用于通用机器人控制的VLA模型:一套框架控制7种机械臂(基于PaliGemma和流匹配的3B模型) 特别重要!!!
https://blog.csdn.net/v_JULY_v/article/details/143472442

pai0

具身智能pai0 pai0.5
https://g.co/gemini/share/ba11d4091950

π0.5是最新的、关于开放世界泛化的 Vision-Language-Action 模型。

论文标题: π 0.5 : a Vision-Language-Action Model with Open-World Generalization
ArXiv 链接: https://arxiv.org/abs/2504.16054

π0是 π0.5的前身,奠定了其多模态、流匹配的控制基础。
论文标题: π0: A Vision-Language-Action Flow Model for General Robot Control
ArXiv 链接: https://arxiv.org/html/2410.24164v1

github:
https://github.com/Physical-Intelligence/openpi

PaliGemma: A versatile 3B VLM for transfer

https://arxiv.org/abs/2407.07726

TransFusion 模型介绍(多模态统一模型)

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
https://arxiv.org/abs/2408.11039

VLM (Vision-Language Model) 架构

PaliGemma: A versatile 3B VLM for transfer
https://arxiv.org/abs/2407.07726

posted @ 2025-10-09 20:30  jack-chen666  阅读(25)  评论(0)    收藏  举报