MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning

for Autonomous Driving via Online Reinforcement Learning

Haoyu Fu^1*, Diankun Zhang^2*, Zongchuang Zhao¹, Jianfeng Cui²,

Hongwei Xie^2†, Bing Wang², Guang Chen², Dingkang Liang^1†, Xiang Bai^1✉

¹Huazhong University of Science & Technology ²Xiaomi EV

^†Project Lead. ^✉Corresponding Author. ^*Equal Contribution.

Abstract

Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. To overcome this limitation, we propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. By feeding trajectory-level rewards back into the reasoning space, MindDrive enables trial-and-error learning over a finite set of discrete linguistic driving decisions, instead of operating directly in a continuous action space. This approach effectively balances optimal decision-making in complex scenarios, human-like driving behavior, and efficient exploration in online reinforcement learning. MindDrive achieves strong closed-loop performance on the challenging Bench2Drive benchmark, with a Driving Score (DS) of 78.04 and a Success Rate (SR) of 55.09\%. To the best of our knowledge, this is the first work to demonstrate the effectiveness of online reinforcement learning for the VLA model in autonomous driving.

Framework

Overview of MindDrive and its training pipeline. MindDrive consists of two experts, both of which utilize the same base LLM and fine-tune with distinct LoRA parameters, respectively. Decision Expert generates high-level meta-actions from scene and text inputs, and Action Expert maps these meta-actions to concrete trajectories. MindDrive is first trained with Imitation Learning (IL) on expert data to align meta-actions with trajectories. Then it is refined through online reinforcement learning in a closed-loop simulator, where action rewards directly enhance the model's reasoning capabilities.

Main Results

Closed-loop and Multi-Ability Results of E2E-AD Methods in Bench2Drive under \textbf{base} training set. C/L refers to camera/LiDAR. * denote expert feature distillation. $\dag$ represent reproducing the result based on official code. DS: Driving Score, SR: Success Rate, M: Merging, O: Overtaking, EB: Emergency Brake, GW: Give Way, TS: Traffic Sign.

BibTeX

@article{fu2025minddrive, title={MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning}, author={Haoyu Fu and Diankun Zhang and Zongchuang Zhao and Jianfeng Cui and Hongwei Xie and Bing Wang and Guang Chen and Dingkang Liang and Xiang Bai}, journal={arXiv Preprint arXiv:2512.13636}, year={2025}, }

MindDrive: A Vision-Language-Action Model

for Autonomous Driving via Online Reinforcement Learning

Abstract

Comprehensive capabilities of MindDrive in close-loop simulation

Framework

Main Results

Qualitative Results

Closed loop evaluation in different scenarios

Construction Obstacle

Pedestrian Crossing

Accident

Dynamic Object Crossing

IL vs. RL: A Visual Comparison

IL (Imitation Learning)

Parking Crossing Pedestrain

Vanilla Non-Signalized Turn

Accident

RL (Reinforcement Learning)

Parking Crossing Pedestrain

Vanilla Non-Signalized Turn

Accident

BibTeX