LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving
Code Documentation Supplementary arXiv 🤗 CARLA Model Zoo 🤗 NAVSIM Checkpoints 🤗 CARLA Leaderboard 2.0 DatasetOverview
Simulators can generate virtually unlimited driving data, yet imitation learning policies still struggle to achieve robust closed-loop performance. We show that this gap is largely driven by systematic misalignment between privileged expert demonstrations and sensor-based learner observations in CARLA. We further demonstrate that commonly used navigation signals, when injected late or treated in isolation, hinder a policy’s ability to jointly reason about goals and scene dynamics. By explicitly minimizing learner–expert asymmetries and redesigning how navigation intent is specified in end-to-end driving policies, LEAD achieves significantly improved alignment and robust closed-loop behavior.
TL;DR: LEAD closes the gap between expert and learner behavior and sets a new closed-loop SOTA on every major CARLA benchmark.
Paper Summary
Learning by Cheating (LBC) distills a privileged teacher into a sensor student. Our paper points out several issues that limit the effectiveness of LBC in CARLA.
Figure 1: Left: Expert operates on privileged inputs. Right: TransFuser v6, the sensory student, replicates the expert's actions with only sensory inputs.
Figure 1: Left: Privileged expert. Right: TransFuser v6 (sensor student).
Figure 2: First issue, visibility asymmetry, produces inconsistent demonstrations. Left: expert stops for invisible oncoming traffic. Right: after filtering invisible actors, expert takes the gap and is forced into an unsafe situation for which a recovery maneuver is neccessary.
Figure 2: Visibility asymmetry. Left: Expert stops for invisible traffic. Right: After filtering invisible actors, expert forced into unsafe situation.
Figure 3: Second issue, uncertainty asymmetry, produces unsafe demonstrations. Left: expert stops for the invisible emergency vehicle and resumes driving prematurely, resulting in a low safety margin. Right: camera-grounded braking delays the stop but extends braking duration, increasing the safety margin.
Figure 3: Uncertainty asymmetry. Left: Expert resumes prematurely with low safety margin. Right: Camera-grounded braking increases safety margin.
t Figure 4: Third issue, intent asymmetry, leads to avoidable infractions. Left: with sparse goal information, the policy lacks foresight, reacts at the last moment, oversteers and crashes. Right: a clearer picture of the intended path allows the policy to position itself early and handle the maneuver cleanly.
Figure 4: Intent asymmetry. Left: Sparse goals lead to late reaction and crash. Right: Dense goal information enables early positioning and clean maneuver.
Target point bias arises when driving policy is overly focused on a single target point, ignoring traffic context. Our paper addresses this issue with simple interventions.
Interactive demonstration: Click anywhere on the image to place a target point (red dot). The blue trajectory shows the predicted path for each policy given the same target point.
Quantitative Results
We mitigate those issues by applying simple alignments on expert and student, which improves performance significantly and shifts the performance frontier.
Figure 6: Traffic rule violations on Longest6 v2 after each isolated fix.
Figure 7: TFv6 outperforms the second-best by 9DS on Bench2Drive.
Closed-Loop Demonstrations
Corner Case Handling
We evaluate the policy under severely degraded visibility conditions. These scenarios are deliberately chosen to stress the failure modes that motivate LEAD: some failures are driven by state mismatch (e.g., heavy occlusions, noisy motion estimates), others are driven by intent mismatch (e.g., lane switch maneuvers). Many scenarios thus require both aligned expert and proper intent conditioning for driving policy to succeed.
Video 2: A compilation of short clips, each highlighting a single critical situation that stress-tests both quality of expert's demonstrations and the policy's ability to follow complex navigation intent.
Video 3: A highly occluded route with narrow, curved roads and obstacles that force the policy to repeatedly stop, wait for safe gaps, and temporarily use the opposite lane to navigate safely.
Tackling Longer Routes
We provide extended qualitative results on the Longest6 v2 benchmark, featuring uninterrupted driving sequences spanning several minutes. These routes contain numerous sharp lane changes that are deliberately difficult to execute safely. This challenges two critical aspects: (1) the policy's ability to stay on route while handling dynamic traffic and (2) its capability to safely recover and get back on track after getting off route.
Video 4: Baseline demonstration of driving in clear condition, single lane road with sparse traffic. Note that previous state-of-the-art methods often even struggle in such simple scenarios, having frequent collisions with pedestrians or running red lights.
Video 5: Driving in multi-lane roads with denser and fast traffic. This route features lots of artificial sharp lane change maneuvers. Policies with poor navigation-condtioning struggle heavily and are prone to dynamic traffic collision.
BibTeX
@article{Nguyen2025ARXIV,
title={LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving},
author={Nguyen, Long and Fauth, Micha and Jaeger, Bernhard and Dauner, Daniel and Igl, Maximilian and Geiger, Andreas and Chitta, Kashyap},
journal={arXiv preprint arXiv:2512.20563},
year={2025}
}
Acknowledgements
Bernhard Jaeger and Andreas Geiger were supported by the ERC Starting Grant LEGO-3D (850533) and the DFG EXC number 2064/1 - project number 390727645. Daniel Dauner was supported by the German Federal Ministry for Economic Affairs and Climate Action within the project NXT GEN AI METHODS (19A23014S). We thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Bernhard Jaeger, Daniel Dauner, and Kashyap Chitta. This research used compute resources at the Tübingen Machine Learning Cloud, DFG FKZ INST 37/1057-1 FUGG as well as the Training Center for Machine Learning (TCML). We also thank Lara Pollehn and Simon Gerstenecker for helpful discussions.