🌠 DriveGEN

This is the official project repository for DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation (CVPR 2025)

Abstract

In autonomous driving, vision-centric 3D detection aims to identify 3D objects from images. However, high data collection costs and diverse real-world scenarios limit the scale of training data. Once distribution shifts occur between training and test data, existing methods often suffer from performance degradation, known as Out-of-Distribution (OOD) problems. To address this, controllable Text-to-Image (T2I) diffusion offers a potential solution for training data enhancement, which is required to generate diverse OOD scenarios with precise 3D object geometry. Nevertheless, existing controllable T2I approaches are restricted by the limited scale of training data or struggle to preserve all annotated 3D objects. In this paper, we present DriveGEN, a method designed to improve the robustness of 3D detectors in Driving via Training-Free Controllable Text-to-Image Diffusion Generation. Without extra diffusion model training, DriveGEN consistently preserves objects with precise 3D geometry across diverse OOD generations, consisting of 2 stages: 1) Self-Prototype Extraction: We empirically find that self-attention features are semantic-aware but require accurate region selection for 3D objects. Thus, we extract precise object features via layouts to capture 3D object geometry, termed self-prototypes. 2) Prototype-Guided Diffusion: To preserve objects across various OOD scenarios, we perform semantic-aware feature alignment and shallow feature alignment during denoising. Extensive experiments demonstrate the effectiveness of DriveGEN in improving 3D detection.

Data Preparation

Monocular 3D object detection

1️⃣ Download the KITTI dataset from the official website
2️⃣ Download the splits (the ImageSets folder) from MonoTTA

Then,

mkdir data && cd data
ln -s /your_path_KITTI .
mv ./ImageSets ./your_path_KITTI

Multi-view 3D object detection

1️⃣ Download the nuScenes dataset from the official website
2️⃣ (Optional) Download the nuScenes-C dataset from the Robo3D benchmark

You can also download all generated images on Hugging Face 🤗

Installation

Build the conda environment via

conda env create -f environment.yml
conda activate driveGEN
pip install -r requirements.txt

Usage

Monocular 3D object detection

#### For self-prototypes extraction
python KITTI_s1_self_prototype_extraction.py

#### For prototype-guided image generation
python KITTI_s2_image_generation.py

Multi-view 3D object detection

#### For self-prototypes extraction
python nus_s1_self_prototype_extraction.py

#### For prototype-guided image generation
python nus_s2_image_generation.py

Citation

If our DriveGEN method is helpful in your research, please consider citing our paper:

@inproceedings{lin2025drivegen,
  title={Drivegen: Generalized and robust 3d detection in driving via controllable text-to-image diffusion generation},
  author={Lin, Hongbin and Guo, Zilu and Zhang, Yifan and Niu, Shuaicheng and Li, Yafeng and Zhang, Ruimao and Cui, Shuguang and Li, Zhen},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={27497--27507},
  year={2025}
}

Acknowledgment

The code is greatly inspired by (heavily from) the FreeControl🔗.

Correspondence

Please contact Hongbin Lin by [linhongbinanthem@gmail.com] if you have any questions. 📬

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
config		config
libs		libs
KITTI_s1_self_prototype_extraction.py		KITTI_s1_self_prototype_extraction.py
KITTI_s2_image_generation.py		KITTI_s2_image_generation.py
README.md		README.md
environment.yml		environment.yml
nus_s1_self_prototype_extraction.py		nus_s1_self_prototype_extraction.py
nus_s2_image_generation.py		nus_s2_image_generation.py
organized_2d_bboxes.json		organized_2d_bboxes.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌠 DriveGEN

Abstract

Data Preparation

Monocular 3D object detection

Multi-view 3D object detection

Installation

Usage

Monocular 3D object detection

Multi-view 3D object detection

Citation

Acknowledgment

Correspondence

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Hongbin98/DriveGEN

Folders and files

Latest commit

History

Repository files navigation

🌠 DriveGEN

Abstract

Data Preparation

Monocular 3D object detection

Multi-view 3D object detection

Installation

Usage

Monocular 3D object detection

Multi-view 3D object detection

Citation

Acknowledgment

Correspondence

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages