Skip to content

ywyeli/UniDrive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UniDrive: Towards Universal Driving Perception Across Camera Configurations

Ye Li1    Wenzhao Zheng2    Xiaonan Huang1    Kurt Keutzer2
1University of Michigan, Ann Arbor    2University of California, Berkeley   

About

UniDrive is a novel framework designed to address the challenge of generalizing perception models across multi-camera configurations.

  • To the best of our knowledge, UniDrive presents the first comprehensive framework designed to generalize vision-centric 3D perception models across diverse camera configurations.
  • We introduce a novel strategy that transforms images into a unified virtual camera space, enhancing the robustness for camera parameter variations.
  • We propose a virtual configuration optimization strategy that minimizes projection error, improving model generalization with minimal performance degradation.
  • We contribute a systematic data generation platform along with a 160,000 frames multi-camera dataset, and benchmark evaluating perception models across varying camera configurations.

Visit our project page to explore more examples. 🚙

🛠️ UniDrive Pipeline

We transform the input images into a unified virtual camera space to achieve universal driving perception. To estimate the depth of pixels in the virtual view for projection, we propose a ground-aware depth assumption strategy. To obtain the most effective virtual camera space for multiple real camera configurations, we propose a data-driven optimization strategy to minimize projection error.

Updates

  • [2025.01] - UniDrive is accepted at ICLR 2025.
  • [2024.10] - Our paper is available on arXiv.

Outline

♨️ Data Preparation

The UniDrive dataset consists of a total of eight Camera Configurations which are inspired by existing self-driving configurations from autonomous vehicle companies.

Each Camera Configuration contains 4 to 8 sensors. For each Camera Configuration, the sub-dataset consists of 20,000 frames of samples, comprising 10,000 samples for training and 10,000 samples for validation.

Town 1 Town 3 Town 4 Town 6

We choose four maps (Towns 1, 3, 4, and 6) in CARLA v0.9.10 to collect point cloud data and generate ground truth information. For each map, we manually set 6 ego-vehicle routes to cover all roads with no roads overlapped. The frequency of the simulation is set to 20 Hz.

Our datasets are hosted by OpenDataLab.


OpenDataLab is a pioneering open data platform for the large AI model era, making datasets accessible. By using OpenDataLab, researchers can obtain free formatted datasets in various fields.

Kindly refer to DATA_PREPARE.md for the details to prepare the UniDrive dataset.

🚙 Camera Configuration

4x95 5x75 6x60 6x70
6x80a 6x80b 5x70+110 8x50

📊 UniDrive Benchmark

📝 TODO List

  • Initial release. 🚀
  • Add Camera Configuration benchmarks.
  • Add more 3D perception models.

Citation

If you find this work helpful for your research, please kindly consider citing our papers:

 @inproceedings{li2024unidrive,
            title={UniDrive: Towards Universal Driving Perception Across Camera Configurations},
            author={Li, Ye and Zheng, Wenzhao and Huang, Xiaonan and Keutzer, Kurt},
            booktitle={International Conference on Learning Representations (ICLR)},
            year={2025}
          }

License

This work is under the MIT License, while some specific implementations in this codebase might be with other licenses.

Acknowledgements

This work is developed based on the MMDetection3D codebase.


MMDetection3D is an open-source toolbox based on PyTorch, towards the next-generation platform for general 3D perception. It is a part of the OpenMMLab project developed by MMLab.

About

[ICLR'25] UniDrive: Towards Universal Driving Perception Across Camera Configurations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published