VLM还是VLA?从现有工作看自动驾驶多模态大模型的发展趋势~

微信视频号:sph0RgSyDYV47z6
快手号:4874645212
抖音号:dy0so323fq2w
小红书号:95619019828
B站1:UID:3546863642871878
B站2:UID: 3546955410049087
近年来,以LLM、VLM和VLA为代表的基础模型在自动驾驶决策中扮演着越来越重要的角色,吸引了学术界和工业界越来越多的关注。许多小伙伴们询问是否有系统的分类汇总。本文按照模型类别,对决策的基础模型进行汇总,后续还将进一步梳理相关算法,并第一时间汇总至『自动驾驶之心知识星球』,欢迎大家一起学习交流~
基于LLM的方法
基于LLM的方法主要是利用大模型的推理能力描述自动驾驶,输入自动驾驶和大模型结合的早期阶段,但仍然值得学习~
Distilling Multi-modal Large Language Models for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2501.09757
  • 会议名称:arXiv
LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models
  • 论文链接:https://arxiv.org/pdf/2501.05057
  • 会议名称:arXiv
CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
  • 论文链接:https://arxiv.org/2503.07234
  • 会议名称:arXiv
PADriver: Towards Personalized Autonomous Driving
  • 论文链接:https://arxiv.org/pdf/2505.05240
  • 会议名称:arXiv
Towards Human-Centric Autonomous Driving: AFast-Slow Architecture Integrating Large LanguageModel Guidance with Reinforcement Learning
  • 论文链接:https://arxiv.org/pdf/2505.06875
  • 项目主页:https://drive.google.com/drive/folders/1K0WgRw1SdJL-JufvJNaTO1ES5SOuSj6p
  • 会议名称:arXiv
Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM
  • 论文链接:https://arxiv.org/abs/2410.04759
  • 会议名称:arXiv
Empowering autonomous driving with large language models: A safety perspective
  • 论文链接:https://arxiv.org/abs/2312.00812
  • 会议名称:ICLR 2024
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
  • 论文链接:https://arxiv.org/pdf/2307.07162.pdf
  • 代码:https://github.com/PJLab-ADG/DriveLikeAHuman
  • 会议名称:arXiv
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2310.01957
  • 代码:https://github.com/wayveai/Driving-with-LLMs
  • 会议名称:LCRA 2024
A Language Agent for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2311.10813
  • 项目主页:https://usc-gvl.github.io/Agent-Driver/
  • 会议名称:arXiv
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2310.03026
  • 项目主页:https://sites.google.com/view/llm-mpc
  • 会议名称:arXiv
Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles
  • 论文链接:https://arxiv.org/2310.08034v1
  • 会议名称:MITS 2024
Dilu: A knowledge-driven approach to autonomous driving with large language models
  • 论文链接:https://arxiv.org/abs/2309.16292
  • 项目主页:https://pjlab-adg.github.io/DiLu/
  • 代码:https://github.com/PJLab-ADG/DiLu
  • 会议名称:LCLR 2024
DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning
  • 论文链接:https://arxiv.org/pdf/2505.05360
  • 会议名称:arXiv
TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning
  • 论文链接:https://arxiv.org/abs/2502.01387
  • 项目主页:https://perfectxu88.github.io/TeLL-Drive.github.io/
  • 会议名称:arXiv
基于VLM的方法
基于VLM和VLA的算法是当前的主流范式,因为视觉是自动驾驶依赖最多的传感器,在这个部分我们汇总了当前最新的工作供大家参考和学习~
Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning
  • 论文链接:https://arxiv.org/abs/2506.18234
  • 会议名称:arXiv
FutureSightDrive: Visualizing Trajectory Planning with Spatio-Temporal CoT for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2505.17685
  • 代码:https://github.com/MIV-XJTU/FSDrive
  • 会议名称:arXiv
Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2501.08861
  • 会议名称:arXiv
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
  • 论文链接:https://arxiv.org/abs/2503.19755
  • 代码:https://github.com/xiaomi-mlab/Orion
  • 会议名称:arXiv
Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts
  • 论文链接:https://arxiv.org/abs/2410.05963
  • 会议名称:NeurIPS 2024
LingoQA: Visual Question Answering for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2312.14115
  • 代码:https://github.com/wayveai/LingoQA/
  • 会议名称:ECCV 2024
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
  • 论文链接:https://arxiv.org/abs/2402.12289
  • 项目主页:https://tsinghua-mars-lab.github.io/DriveVLM/
  • 会议名称:arXiv
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2405.15324
  • 代码:https://github.com/PJLab-ADG/LeapAD
  • 会议名称:NeurIPS 2024
ADAPT: Action-aware Driving Caption Transformer
  • 论文链接:https://arxiv.org/abs/2302.00673
  • 代码:https://github.com/jxbbb/ADAPT
  • 会议名称:ICRA 2023
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
  • 论文链接:https://arxiv.org/abs/2310.01412
  • 项目主页:https://tonyxuqaq.github.io/projects/DriveGPT4/
  • 会议名称:RAL 2024
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2505.00284
  • 代码:https://github.com/michigan-traffic-lab/LightEMMA
  • 会议名称:arXiv
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
  • 论文链接:https://arxiv.org/abs/2505.12670
  • 会议名称:arXiv
VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision
  • 论文链接:https://arxiv.org/pdf/2412.14446
  • 会议名称:arXiv
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving
  • 论文链接:https://arxiv.org/pdf/2412.15208
  • 代码:https://github.com/taco-group/OpenEMMA
  • 会议名称:WACV 2025
CALMM-Drive: Confidence-Aware Autonomous Driving with Large Multi modal Model
  • 论文链接:https://arxiv.org/pdf/2412.04209
  • 会议名称:arXiv
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model
  • 论文链接:https://arxiv.org/2412.09951
  • 项目主页:https://wyddmw.github.io/WiseAD_demo/
  • 代码:https://github.com/wyddmw/WiseAD
  • 会议名称:arXiv
VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
  • 论文链接:https://arxiv.org/2502.00843
  • 会议名称:arXiv
VLM-E2E: Enhancing End-to-End Autonomous Driving with Multi modal Driver Attention Fusion
  • 论文链接:https://arxiv.org/2502.18042
  • 会议名称:arXiv
VLM-MPC: Vision Language Foundation Model (VLM)-Guided Model Predictive Controller (MPC) for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2408.04821
  • 会议名称:ICML 2025
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
  • 论文链接:https://arxiv.org/2502.14917
  • 会议名称:arXiv
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
  • 论文链接:https://arxiv.org/pdf/2503.07608
  • 代码:https://github.com/hustvl/AlphaDrive
  • 会议名称:arXiv
X-Driver: Explainable Autonomous Driving with Vision-Language Models
  • 论文链接:https://arxiv.org/pdf/2505.05098
  • 会议名称:arXiv
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
  • 论文链接:https://arxiv.org/pdf/2505.08725
  • 代码:https://arxiv.org/pdf/2505.08725
  • 会议名称:arXiv
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
  • 论文链接:https://arxiv.org/pdf/2505.00284
  • 代码:https://github.com/michigan-traffic-lab/LightEMMA
  • 会议名称:arXiv
基于VLA的方法
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
  • 论文链接:https://arxiv.org/abs/2506.13757
  • 项目主页:https://autovla.github.io/
  • 代码:https://github.com/ucla-mobility/AutoVLA
  • 会议名称:arXiv
DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2505.19381
  • 会议名称:arXiv
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
  • 论文链接:https://arxiv.org/abs/2505.23757
  • 项目主页:http://impromptu-vla.c7w.tech/
  • 代码:https://github.com/ahydchh/Impromptu-VLA
  • 会议名称:arXiv
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2505.16278
  • 项目主页:https://thinklab-sjtu.github.io/DriveMoE/
  • 会议名称:arXiv
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
  • 论文链接:https://arxiv.org/pdf/2503.23463
  • 代码:https://github.com/DriveVLA/OpenDriveVLA
  • 会议名称:arXiv
 
微信视频号:sph0RgSyDYV47z6
快手号:4874645212
抖音号:dy0so323fq2w
小红书号:95619019828
B站1:UID:3546863642871878
B站2:UID: 3546955410049087
 
参考文献链接
 
posted @ 2025-08-21 10:40  吴建明wujianming  阅读(82)  评论(0)    收藏  举报