VLM还是VLA？从现有工作看自动驾驶多模态大模型的发展趋势~

微信视频号：sph0RgSyDYV47z6

快手号：4874645212

抖音号：dy0so323fq2w

小红书号：95619019828

B站1：UID:3546863642871878

B站2：UID: 3546955410049087

近年来，以LLM、VLM和VLA为代表的基础模型在自动驾驶决策中扮演着越来越重要的角色，吸引了学术界和工业界越来越多的关注。许多小伙伴们询问是否有系统的分类汇总。本文按照模型类别，对决策的基础模型进行汇总，后续还将进一步梳理相关算法，并第一时间汇总至『自动驾驶之心知识星球』，欢迎大家一起学习交流~

基于LLM的方法

基于LLM的方法主要是利用大模型的推理能力描述自动驾驶，输入自动驾驶和大模型结合的早期阶段，但仍然值得学习~

Distilling Multi-modal Large Language Models for Autonomous Driving

论文链接：https://arxiv.org/abs/2501.09757
会议名称：arXiv

LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models

论文链接：https://arxiv.org/pdf/2501.05057
会议名称：arXiv

CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting

论文链接：https://arxiv.org/2503.07234
会议名称：arXiv

PADriver: Towards Personalized Autonomous Driving

论文链接：https://arxiv.org/pdf/2505.05240
会议名称：arXiv

Towards Human-Centric Autonomous Driving: AFast-Slow Architecture Integrating Large LanguageModel Guidance with Reinforcement Learning

论文链接：https://arxiv.org/pdf/2505.06875
项目主页：https://drive.google.com/drive/folders/1K0WgRw1SdJL-JufvJNaTO1ES5SOuSj6p
会议名称：arXiv

Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM

论文链接：https://arxiv.org/abs/2410.04759
会议名称：arXiv

Empowering autonomous driving with large language models: A safety perspective

论文链接：https://arxiv.org/abs/2312.00812
会议名称：ICLR 2024

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

论文链接：https://arxiv.org/pdf/2307.07162.pdf
代码：https://github.com/PJLab-ADG/DriveLikeAHuman
会议名称：arXiv

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

论文链接：https://arxiv.org/abs/2310.01957
代码：https://github.com/wayveai/Driving-with-LLMs
会议名称：LCRA 2024

A Language Agent for Autonomous Driving

论文链接：https://arxiv.org/abs/2311.10813
项目主页：https://usc-gvl.github.io/Agent-Driver/
会议名称：arXiv

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

论文链接：https://arxiv.org/abs/2310.03026
项目主页：https://sites.google.com/view/llm-mpc
会议名称：arXiv

Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles

论文链接：https://arxiv.org/2310.08034v1
会议名称：MITS 2024

Dilu: A knowledge-driven approach to autonomous driving with large language models

论文链接：https://arxiv.org/abs/2309.16292
项目主页：https://pjlab-adg.github.io/DiLu/
代码：https://github.com/PJLab-ADG/DiLu
会议名称：LCLR 2024

DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning

论文链接：https://arxiv.org/pdf/2505.05360
会议名称：arXiv

TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning

论文链接：https://arxiv.org/abs/2502.01387
项目主页：https://perfectxu88.github.io/TeLL-Drive.github.io/
会议名称：arXiv

基于VLM的方法

基于VLM和VLA的算法是当前的主流范式，因为视觉是自动驾驶依赖最多的传感器，在这个部分我们汇总了当前最新的工作供大家参考和学习~

Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning

论文链接：https://arxiv.org/abs/2506.18234
会议名称：arXiv

FutureSightDrive: Visualizing Trajectory Planning with Spatio-Temporal CoT for Autonomous Driving

论文链接：https://arxiv.org/abs/2505.17685
代码：https://github.com/MIV-XJTU/FSDrive
会议名称：arXiv

Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving

论文链接：https://arxiv.org/abs/2501.08861
会议名称：arXiv

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

论文链接：https://arxiv.org/abs/2503.19755
代码：https://github.com/xiaomi-mlab/Orion
会议名称：arXiv

Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts

论文链接：https://arxiv.org/abs/2410.05963
会议名称：NeurIPS 2024

LingoQA: Visual Question Answering for Autonomous Driving

论文链接：https://arxiv.org/abs/2312.14115
代码：https://github.com/wayveai/LingoQA/
会议名称：ECCV 2024

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

论文链接：https://arxiv.org/abs/2402.12289
项目主页：https://tsinghua-mars-lab.github.io/DriveVLM/
会议名称：arXiv

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

论文链接：https://arxiv.org/abs/2405.15324
代码：https://github.com/PJLab-ADG/LeapAD
会议名称：NeurIPS 2024

ADAPT: Action-aware Driving Caption Transformer

论文链接：https://arxiv.org/abs/2302.00673
代码：https://github.com/jxbbb/ADAPT
会议名称：ICRA 2023

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

论文链接：https://arxiv.org/abs/2310.01412
项目主页：https://tonyxuqaq.github.io/projects/DriveGPT4/
会议名称：RAL 2024

LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving

论文链接：https://arxiv.org/abs/2505.00284
代码：https://github.com/michigan-traffic-lab/LightEMMA
会议名称：arXiv

TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning

论文链接：https://arxiv.org/abs/2505.12670
会议名称：arXiv

VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

论文链接：https://arxiv.org/pdf/2412.14446
会议名称：arXiv

OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving

论文链接：https://arxiv.org/pdf/2412.15208
代码：https://github.com/taco-group/OpenEMMA
会议名称：WACV 2025

CALMM-Drive: Confidence-Aware Autonomous Driving with Large Multi modal Model

论文链接：https://arxiv.org/pdf/2412.04209
会议名称：arXiv

WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model

论文链接：https://arxiv.org/2412.09951
项目主页：https://wyddmw.github.io/WiseAD_demo/
代码：https://github.com/wyddmw/WiseAD
会议名称：arXiv

VLM-Assisted Continual learning for Visual Question Answering in Self-Driving

论文链接：https://arxiv.org/2502.00843
会议名称：arXiv

VLM-E2E: Enhancing End-to-End Autonomous Driving with Multi modal Driver Attention Fusion

论文链接：https://arxiv.org/2502.18042
会议名称：arXiv

VLM-MPC: Vision Language Foundation Model (VLM)-Guided Model Predictive Controller (MPC) for Autonomous Driving

论文链接：https://arxiv.org/abs/2408.04821
会议名称：ICML 2025

Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning

论文链接：https://arxiv.org/2502.14917
会议名称：arXiv

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

论文链接：https://arxiv.org/pdf/2503.07608
代码：https://github.com/hustvl/AlphaDrive
会议名称：arXiv

X-Driver: Explainable Autonomous Driving with Vision-Language Models

论文链接：https://arxiv.org/pdf/2505.05098
会议名称：arXiv

Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving

论文链接：https://arxiv.org/pdf/2505.08725
代码：https://arxiv.org/pdf/2505.08725
会议名称：arXiv

LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving

论文链接：https://arxiv.org/pdf/2505.00284
代码：https://github.com/michigan-traffic-lab/LightEMMA
会议名称：arXiv

基于VLA的方法

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

论文链接：https://arxiv.org/abs/2506.13757
项目主页：https://autovla.github.io/
代码：https://github.com/ucla-mobility/AutoVLA
会议名称：arXiv

DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving

论文链接：https://arxiv.org/abs/2505.19381
会议名称：arXiv

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

论文链接：https://arxiv.org/abs/2505.23757
项目主页：http://impromptu-vla.c7w.tech/
代码：https://github.com/ahydchh/Impromptu-VLA
会议名称：arXiv

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

论文链接：https://arxiv.org/abs/2505.16278
项目主页：https://thinklab-sjtu.github.io/DriveMoE/
会议名称：arXiv

OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model

论文链接：https://arxiv.org/pdf/2503.23463
代码：https://github.com/DriveVLA/OpenDriveVLA
会议名称：arXiv

微信视频号：sph0RgSyDYV47z6

快手号：4874645212

抖音号：dy0so323fq2w

小红书号：95619019828

B站1：UID:3546863642871878

B站2：UID: 3546955410049087

参考文献链接

VLM还是VLA？从现有工作看自动驾驶多模态大模型的发展趋势~

posted @ 2025-08-21 10:40 吴建明wujianming 阅读(121) 评论(0) 收藏举报

刷新页面返回顶部

吴建明

微信视频号：sph0RgSyDYV47z6 快手号：4874645212 抖音号：dy0so323fq2w 小红书号：95619019828 B站1：UID:3546863642871878 B站2：UID: 3546955410049087 知乎视频：https://www.zhihu.com/people/wujianming_110117/zvideos 知乎：https://www.zhihu.com/people/wujianming_110117

VLM还是VLA？从现有工作看自动驾驶多模态大模型的发展趋势~

公告