Skip to content

SungHunYang/MonoCLUE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection

Introduction

This repository provides the official implementation of "MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection" based on the excellent work MonoDGP. In this work, we propose a DETR-based monocular 3D detection framework that strengthens visual reasoning by leveraging clustering and scene memory, enabling robust performance under occlusion and limited visibility.

Demo

Scene 1

Scene 2

Main Result

Note that the randomness of training for monocular detection would cause a variance of ±1 AP3D|R40 on KITTI.

The official results :

Models Val, AP3D|R40 Logs Ckpts
Easy Mod. Hard
MonoCLUE 33.7426% 24.1090% 20.5883% log ckpt
31.5802% 23.5648% 20.2746% log ckpt

The test result :

Installation

  1. Clone this project and create a conda environment:

    git clone https://github.com/SungHunYang/MonoCLUE.git
    cd MonoCLUE
    
    conda create -n monoclue python=3.8
    conda activate monoclue
    
  2. Install pytorch and torchvision matching your CUDA version:

    pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
  3. Install requirements and compile the deformable attention:

    pip install -r requirements.txt
    
    cd lib/models/monoclue/ops/
    bash make.sh
    
    cd ../../../..
    
  4. Download KITTI datasets and prepare the directory structure as:

    │MonoCLUE/
    ├──...
    │data/kitti/
    ├──ImageSets/
    ├──training/
    │   ├──image_2
    │   ├──label_2
    │   ├──calib
    ├──testing/
    │   ├──image_2
    │   ├──calib
    

    Note that if you need the Waymo dataset, please follow DEVIANT

  5. Download sam_vit_h.pth from the SAM repository and prepare the SAM-guided dataset.

    python make_sam.py
    

    Important: Before generating the SAM-guided dataset, please set self.data_augmentation = False in kitti_dataset.py, and set batch_size = 1 in monoclue.yaml.
    Otherwise, the generated SAM labels may be misaligned.

    Note that if you run it with all_category, a folder named "label_sam_all" should be created.

  6. Finally, prepare the directory structure as:

    │MonoCLUE/
    ├──...
    │data/kitti/
    ├──ImageSets/
    ├──training/
    │   ├──image_2
    │   ├──label_2
    │   ├──calib
    |   ├──label_sam/
    |        ├──region
    |        └──depth
    ├──testing/
    │   ├──image_2
    │   ├──calib
    

    You can also change the data path at "dataset/root_dir" in configs/monoclue.yaml.

Get Started

Train

You can modify the settings of models and training in configs/monoclue.yaml and indicate the GPU in train.sh:

bash train.sh configs/monoclue.yaml > logs/monoclue.log

Test

The best checkpoint will be evaluated as default. You must ensure that the checkpoint is located in outputs/monoclue/:

bash test.sh configs/monoclue.yaml

Visualize

  1. After testing, prepare the directory structure as follows:

    │MonoCLUE/
    ├──outputs/
    │   ├──monoclue/
    │       ├──outputs/
    │           ├──data/
    │               ├──000001.txt
    │               ├──000002.txt
    │               ├──000003.txt
    │               ├──000004.txt
    
  2. Navigate to the visualization folder:

    cd visualize
  3. Run the visualization script:

    python draw3D_bbox.py
    
    # With detailed information
    python draw3D_bbox.py --print_info True

Note that if you need LiDAR visualization, please follow kitti_object_vis repository

Citation

Please cite this work if you find it useful:

@article{yang2025monoclue,
  title={MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection},
  author={Yang, Sunghun and Lee, Minhyeok and Lee, Jungho and Lee, Sangyoun},
  journal={arXiv preprint arXiv:2511.07862},
  year={2025}
}

Acknowlegment

This repo benefits from the excellent MonoDETR / MonoDGP.

About

[ AAAI 2026 ] The official implementation of 'MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection'

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors