This repository provides the official implementation of "MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection" based on the excellent work MonoDGP. In this work, we propose a DETR-based monocular 3D detection framework that strengthens visual reasoning by leveraging clustering and scene memory, enabling robust performance under occlusion and limited visibility.
Note that the randomness of training for monocular detection would cause a variance of ±1 AP3D|R40 on KITTI.
The official results :
| Models | Val, AP3D|R40 | Logs | Ckpts | ||
| Easy | Mod. | Hard | |||
| MonoCLUE | 33.7426% | 24.1090% | 20.5883% | log | ckpt |
| 31.5802% | 23.5648% | 20.2746% | log | ckpt | |
The test result :
-
Clone this project and create a conda environment:
git clone https://github.com/SungHunYang/MonoCLUE.git cd MonoCLUE conda create -n monoclue python=3.8 conda activate monoclue -
Install pytorch and torchvision matching your CUDA version:
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
-
Install requirements and compile the deformable attention:
pip install -r requirements.txt cd lib/models/monoclue/ops/ bash make.sh cd ../../../.. -
Download KITTI datasets and prepare the directory structure as:
│MonoCLUE/ ├──... │data/kitti/ ├──ImageSets/ ├──training/ │ ├──image_2 │ ├──label_2 │ ├──calib ├──testing/ │ ├──image_2 │ ├──calibNote that if you need the Waymo dataset, please follow DEVIANT
-
Download sam_vit_h.pth from the SAM repository and prepare the SAM-guided dataset.
python make_sam.pyImportant: Before generating the SAM-guided dataset, please set
self.data_augmentation = Falseinkitti_dataset.py, and setbatch_size = 1inmonoclue.yaml.
Otherwise, the generated SAM labels may be misaligned.Note that if you run it with all_category, a folder named "label_sam_all" should be created.
-
Finally, prepare the directory structure as:
│MonoCLUE/ ├──... │data/kitti/ ├──ImageSets/ ├──training/ │ ├──image_2 │ ├──label_2 │ ├──calib | ├──label_sam/ | ├──region | └──depth ├──testing/ │ ├──image_2 │ ├──calibYou can also change the data path at "dataset/root_dir" in
configs/monoclue.yaml.
You can modify the settings of models and training in configs/monoclue.yaml and indicate the GPU in train.sh:
bash train.sh configs/monoclue.yaml > logs/monoclue.log
The best checkpoint will be evaluated as default. You must ensure that the checkpoint is located in outputs/monoclue/:
bash test.sh configs/monoclue.yaml
-
After testing, prepare the directory structure as follows:
│MonoCLUE/ ├──outputs/ │ ├──monoclue/ │ ├──outputs/ │ ├──data/ │ ├──000001.txt │ ├──000002.txt │ ├──000003.txt │ ├──000004.txt -
Navigate to the visualization folder:
cd visualize -
Run the visualization script:
python draw3D_bbox.py # With detailed information python draw3D_bbox.py --print_info True
Note that if you need LiDAR visualization, please follow kitti_object_vis repository
Please cite this work if you find it useful:
@article{yang2025monoclue,
title={MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection},
author={Yang, Sunghun and Lee, Minhyeok and Lee, Jungho and Lee, Sangyoun},
journal={arXiv preprint arXiv:2511.07862},
year={2025}
}



