MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection

Introduction

This repository provides the official implementation of "MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection" based on the excellent work MonoDGP. In this work, we propose a DETR-based monocular 3D detection framework that strengthens visual reasoning by leveraging clustering and scene memory, enabling robust performance under occlusion and limited visibility.

Demo

Scene 1

Scene 2

Main Result

Note that the randomness of training for monocular detection would cause a variance of ±1 AP_3D|R40 on KITTI.

The official results :

Models	Val, AP_3D\|R40			Logs	Ckpts
Models	Easy	Mod.	Hard	Logs	Ckpts
MonoCLUE	33.7426%	24.1090%	20.5883%	log	ckpt
	31.5802%	23.5648%	20.2746%	log	ckpt

The test result :

Installation

Clone this project and create a conda environment:

git clone https://github.com/SungHunYang/MonoCLUE.git
cd MonoCLUE

conda create -n monoclue python=3.8
conda activate monoclue

Install pytorch and torchvision matching your CUDA version:

pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121

Install requirements and compile the deformable attention:

pip install -r requirements.txt

cd lib/models/monoclue/ops/
bash make.sh

cd ../../../..

Download KITTI datasets and prepare the directory structure as:

│MonoCLUE/
├──...
│data/kitti/
├──ImageSets/
├──training/
│   ├──image_2
│   ├──label_2
│   ├──calib
├──testing/
│   ├──image_2
│   ├──calib

Note that if you need the Waymo dataset, please follow DEVIANT

Download sam_vit_h.pth from the SAM repository and prepare the SAM-guided dataset.
```
python make_sam.py
```
Important: Before generating the SAM-guided dataset, please set self.data_augmentation = False in kitti_dataset.py, and set batch_size = 1 in monoclue.yaml.
Otherwise, the generated SAM labels may be misaligned.

Note that if you run it with all_category, a folder named "label_sam_all" should be created.

Finally, prepare the directory structure as:

│MonoCLUE/
├──...
│data/kitti/
├──ImageSets/
├──training/
│   ├──image_2
│   ├──label_2
│   ├──calib
|   ├──label_sam/
|        ├──region
|        └──depth
├──testing/
│   ├──image_2
│   ├──calib

You can also change the data path at "dataset/root_dir" in configs/monoclue.yaml.

Get Started

Train

You can modify the settings of models and training in configs/monoclue.yaml and indicate the GPU in train.sh:

bash train.sh configs/monoclue.yaml > logs/monoclue.log

Test

The best checkpoint will be evaluated as default. You must ensure that the checkpoint is located in outputs/monoclue/:

bash test.sh configs/monoclue.yaml

Visualize

After testing, prepare the directory structure as follows:

│MonoCLUE/
├──outputs/
│   ├──monoclue/
│       ├──outputs/
│           ├──data/
│               ├──000001.txt
│               ├──000002.txt
│               ├──000003.txt
│               ├──000004.txt

Navigate to the visualization folder:
```
cd visualize
```

Run the visualization script:

python draw3D_bbox.py

# With detailed information
python draw3D_bbox.py --print_info True

Note that if you need LiDAR visualization, please follow kitti_object_vis repository

Citation

Please cite this work if you find it useful:

@article{yang2025monoclue,
  title={MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection},
  author={Yang, Sunghun and Lee, Minhyeok and Lee, Jungho and Lee, Sangyoun},
  journal={arXiv preprint arXiv:2511.07862},
  year={2025}
}

Acknowlegment

This repo benefits from the excellent MonoDETR / MonoDGP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection

Introduction

Demo

Scene 1

Scene 2

Main Result

Installation

Get Started

Train

Test

Visualize

Citation

Acknowlegment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
configs		configs
figures		figures
lib		lib
logs		logs
tools		tools
utils		utils
visualize		visualize
README.md		README.md
make_sam.py		make_sam.py
requirements.txt		requirements.txt
test.sh		test.sh
train.sh		train.sh

Folders and files

Latest commit

History

Repository files navigation

MonoCLUE: Object-Aware Clustering Enhances Monocular 3D Object Detection

Introduction

Demo

Scene 1

Scene 2

Main Result

Installation

Get Started

Train

Test

Visualize

Citation

Acknowlegment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages