🚀 GREAT-Stereo (ICCV 2025) 🚀

Our significant extension version of GREAT, termed as GREATEN, is available at Paper, Code.

This repository contains the source code for our paper.

Global Regulation and Excitation via Attention Tuning for Stereo Matching (GREAT-Stereo)

Jiahao LI, Xinhong Chen, Zhengmin JIANG, Qian Zhou, Yung-Hui Li, Jianping Wang

💡 Abstract

Stereo matching achieves significant progress with iterative algorithms like RAFT-Stereo and IGEV-Stereo. However, these methods struggle in ill-posed regions with occlusions, textureless, or repetitive patterns, due to a lack of global context and geometric information for effective iterative refinement. To enable the existing iterative approaches to incorporate global context, we propose the Global Regulation and Excitation via Attention Tuning (GREAT) framework which encompasses three attention modules. Specifically, Spatial Attention (SA) captures the global context within the spatial dimension, Matching Attention (MA) extracts global context along epipolar lines, and Volume Attention (VA) works in conjunction with SA and MA to construct a more robust cost-volume excited by global context and geometric details. To verify the universality and effectiveness of this framework, we integrate it into several representative iterative stereo-matching methods and validate it through extensive experiments, collectively denoted as GREAT-Stereo. This framework demonstrates superior performance in challenging ill-posed regions. Applied to IGEV-Stereo, among all published methods, our GREAT-IGEV ranks first on the Scene Flow test set, KITTI 2015, and ETH3D leaderboards, and achieves second on the Middlebury benchmark.

Our main contributions are:

We propose a universal framework that can be integrated into existing iterative stereo-matching methods to improve the performance in ill-posed regions.
We introduce Spatial (SA), Matching (MA), and Volume (VA) Attentions, designed to mitigate ambiguities in ill-posed regions with global context information.
Our method outperforms existing published methods on public leaderboards such as SceneFlow, KITTI, ETH3D, and Middlebury, with especially significant improvements in ill-posed regions.

✅ To Do List

~~The real-time version of the GREAT Framwork.~~ (Hint: This TODO list will be implemented on our significant extension of GREAT, termed as GREATEN.)
The gpu-memory-friendly implementation of the Matching Attention. (Hint: see at GREATEN repository.)
The Foundation-Model-based experiments.
The solid and robust version of the GREAT Framwork.
The accelerate training and evaluating pipeline.

🆕 Solid Version of GREAT-Stereo

We now propose a solid and robust version of our GREAT Framework, which obtains better performance on the SceneFlow and public KITTI 2012/2015 benchmarks, especially in ill-posed regions like Occlusion. Meanwhile, the Foundation-Model version of our GREAT-IGEV also obtains comparable performance with the current SOTA Foundation-Model-based architectures.

We merge the solid and robust version of GREAT-Stereo into great-stereo folder.

Our main modifications are:

We simplify the implementation of Volume Attention.
We extend the application of Spatial Attention.
We remove the redundant implementation of receptive augmentation.
We modify the cost volume construction pipeline with combined cost volume.
We implement Foundation-Model (DepthAny) based GREAT-IGEV named GREAT-IGEV-DepthAny by replacing the mobilenetv2 backbone with DepthAnythingV2, which is based on the implementation in Monster, and conduct the Foundation-Model-based experiments.
We accelerate the training and evaluation with DistributedDataParallel settings.

The benchmark results and corresponding checkpoints are:

Models	SceneFlow						KITTI2012				KITTI2015				Params	Run Time	Checkpoints
Models	EPE	D3	Occ-EPE	Occ-D3	Non-Occ-EPE	Non-Occ-D3	Out-Noc (2px)	Out-All (2px)	Out-Noc (3px)	Out-All (3px)	D1-All	D1-bg	Noc-D1-All	Noc-D1-bg	Params	Run Time	Checkpoints
Light-Weight Model
LEA-Stereo	0.78	-	-	-	-	-	1.90	2.39	1.13	1.45	1.65	1.40	1.51	1.29	1.81M	0.30s	-
ACVNet	0.48	-	-	-	-	-	1.83	2.34	1.13	1.47	1.65	1.37	1.52	1.26	6.20M	0.20s	-
IGEV-Stereo	0.48	-	1.65	-	0.19	-	1.71	2.17	1.12	1.44	1.59	1.38	1.49	1.27	12.60M	0.32s	-
Selective-IGEV	0.45	-	1.57	-	0.17	-	1.59	2.05	1.07	1.38	1.55	1.33	1.44	1.22	13.14M	0.24s	-
IGEV++	0.43	-	-	-	-	-	1.56	2.03	1.04	1.36	1.51	1.31	1.42	1.20	14.53M	0.28s	-
GREAT-IGEV (Ours)	0.41	2.20	1.51	10.12	0.14	0.49	1.51	2.00	1.02	1.37	1.50	1.28	1.37	1.14	14.44M	0.33s	Google Drive
GREAT-Selective (Ours)	0.42	2.19	1.52	10.11	0.15	0.48	1.48	1.94	1.00	1.31	1.49	1.27	1.40	1.16	14.98M	0.43s	Google Drive
GREAT-IGEV-Solid (Ours)	0.39	2.13	1.48	9.08	0.12	0.47	1.47	1.98	0.95	1.32	1.47	1.25	1.37	1.14	18.4M	0.33s	Google Drive
GREAT-Selective-Solid (Ours)	0.38	2.07	1.46	8.85	0.11	0.46	-	-	-	-	-	-	-	-	18.9M	0.43s	Google Drive
Foundation Model
ViTA-Stereo	0.34	-	-	-	-	-	1.46	1.80	0.93	1.16	1.50	1.21	1.41	1.12	-	-	-
AIO-Stereo	-	-	-	-	-	-	1.58	1.94	1.05	1.29	1.54	1.34	1.43	1.22	-	-	-
Foundation-Stereo	0.34	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
DEFOM-Stereo	0.42	-	-	-	-	-	1.43	1.79	0.94	1.18	1.41	1.25	1.33	1.15	-	0.30s	-
IGEV++ (DepthAny)	-	-	-	-	-	-	1.36	1.74	0.89	1.13	1.43	1.15	1.36	1.07	348M	0.48s	-
Monster	0.37	2.00	1.35	9.18	0.14	0.44	1.36	1.75	0.84	1.09	1.41	1.13	1.33	1.05	388M	0.45s	-
GREAT-IGEV-DepthAny (Ours)	0.36	2.03	1.41	8.70	0.11	0.45	1.34	1.76	0.85	1.13	1.43	1.15	1.36	1.07	386M	0.43s	Google Drive

The zero-shot results for Foundation Models are:

Models	SceneFlow (EPE)	KITTI2012 (D3)	KITTI2015 (D3)	Middlebury (D2)	ETH3D (D1)
StereoAnywhere	-	3.90	3.93	6.96	1.66
FoundationStereo	0.34	-	-	5.5	1.8
DEFOM-Stereo	0.42	3.76	4.99	5.91	2.35
Monster	0.38	3.37	3.44	3.67	1.10
Monster*	0.39	4.82	5.98	4.66	9.15
GREAT-IGEV-DepthAny (Ours)	0.36	4.31	5.48	3.35	5.82
GREAT-IGEV-DepthAny* (Ours)	0.39	4.34	5.56	3.26	2.48

PS: Monster* is the result from the SceneFlow reproduction experiment by using the official code of Monster, see issue#28 in the official code for more information.

PS: GREAT-IGEV-DepthAny* is the result from the SceneFlow experiment after zero-shot selection, according to the issue#23 in the offcicial code of Monster.

🎬 Demos & Results

RAFT Demo

IGEV Demo

Selective Demo

Qualitative results of GREAT-IGEV on the Scene Flow test set of occlusion (Row 1), textureless (Row 2), and repetitive texture (Row 3) regions.

Comparisons with state-of-the-art stereo methods on different public benchmarks and ablation study of the cross-model transferability of the proposed GREAT framework on the Scene Flow test set.

⚙️ Environment Settings

NVIDIA RTX 3090 or 4090
python 3.8

conda create -n great python=3.8
conda activate great

pip install torch torchvision torchaudio xformers==0.0.22.post3+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install tqdm==4.67.1
pip install scipy==1.10.1
pip install opencv-python==4.11.0.86
pip install scikit-image==0.21.0
pip install tensorboard==2.12.0
pip install matplotlib==3.7.5
pip install timm==0.5.4
pip install numpy==1.24.1
pip install einops==0.8.1
pip install open3d==0.19.0

💾 Required Data

🧪 Evaluation

Download the Checkpoints from Google Drive.
Change the following parameters in the script located at launchers/stereo_matching/test_launcher/.
- dataset
  - Choices => [sceneflow, kitti, eth3d, middlebury_(Q | H | F)]
- dataset_root
  - your/path/to/corresponding/dataset
- restore_ckpt
  - your/path/to/checkpoint
- max_disp (Optional)
  - 768 for Middlebury and 192 for others
Run the evaluation (e.g. Evaluation of GREAT-IGEV on Scene Flow test set).

./launchers/stereo_matching/test_launcher/great_igev_evaluator.sh

(Optional) You can also change the eval_mode in the evaluation script to get different evaluation results.
- metric to generate evaluation quantity results (Default).
- pcgen to generate the points cloud of predicted disparity for visualization.
- cvvis to generate the visualization of the cost volume.

📚 Training

Change the following parameters in the script located at launchers/stereo_matching/train_launcher/.
- logdir
  - your/path/to/save/training/information
- train_datasets
  - Choices => [sceneflow, vkitti2, kitti, eth3d_train, eth3d_finetune, middlebury_train, middlebury_finetune]
- train_datasets_root
  - your/path/to/corresponding/dataset
- restore_ckpt (Optional)
  - your/path/to/checkpoint/for/finetuning
Run the training (e.g. Training of GREAT-IGEV on Scene Flow test set).

./launchers/stereo_matching/train_launcher/great_igev_trainer.sh

(Optional) You can also change the trainer in the script from stereo_trainer.py to stereo_resumable_trainer.py, which can resume the training if the training process has been accidentally shut down. The stereo_resumable_trainer.py will save checkpoints for model, optimizer, and learning rate scheduler for resuming.
(Optional) Thanks for the repository of IGEV-Stereo, we also provide the choices of the data type in mixed precision training. You can change this data type with precision_dtype in the script. Choices are float32, float16, and bfloat16. Default value is float16. NOTE: Our provided checkpoints are trained with float16 and float32.

📦 Submission

For submission to the KITTI benchmark (e.g. GREAT-IGEV).

python3 save_disp_kitti.py --name great-igev-stereo --restore_ckpt your/path/to/checkpoint --left_imgs your/path/to/left/imgs --right_imgs your/path/to/right/imgs --output_directory your/path/to/save/submission/results

For submission to the ETH3D benchmark (e.g. GREAT-IGEV).

python3 save_disp_eth3d.py --name great-igev-stereo --restore_ckpt your/path/to/checkpoint --left_imgs your/path/to/left/imgs --right_imgs your/path/to/right/imgs --output_directory your/path/to/save/submission/results

For submission to the Middlebury benchmark (e.g. GREAT-IGEV).

python3 save_disp_middlebury.py --name great-igev-stereo --restore_ckpt your/path/to/checkpoint --left_imgs your/path/to/left/imgs --right_imgs your/path/to/right/imgs --output_directory your/path/to/save/submission/results

📖 Citation

If you find our works useful in your research, please consider citing our paper.

@inproceedings{li2025global,
  title={Global regulation and excitation via attention tuning for stereo matching},
  author={Li, Jiahao and Chen, Xinhong and Jiang, Zhengmin and Zhou, Qian and Li, Yung-Hui and Wang, Jianping},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={25539--25549},
  year={2025}
}

Acknowledgements

This project is based on RAFT-Stereo, IGEV-Stereo, and Selective-Stereo. Meanwhile, the core attention modules of this project are modified from CoEx, VOLO, and Swin-Transformer. We thank the original authors for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
demos		demos
launchers/stereo_matching		launchers/stereo_matching
models		models
modules		modules
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
save_disp_eth3d.py		save_disp_eth3d.py
save_disp_kitti.py		save_disp_kitti.py
save_disp_middlebury.py		save_disp_middlebury.py
stereo_dist_evaluator.py		stereo_dist_evaluator.py
stereo_evaluator.py		stereo_evaluator.py
stereo_resumable_dist_trainer.py		stereo_resumable_dist_trainer.py
stereo_resumable_trainer.py		stereo_resumable_trainer.py
stereo_trainer.py		stereo_trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 GREAT-Stereo (ICCV 2025) 🚀

💡 Abstract

✅ To Do List

🆕 Solid Version of GREAT-Stereo

🎬 Demos & Results

⚙️ Environment Settings

💾 Required Data

🧪 Evaluation

📚 Training

📦 Submission

📖 Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 GREAT-Stereo (ICCV 2025) 🚀

💡 Abstract

✅ To Do List

🆕 Solid Version of GREAT-Stereo

🎬 Demos & Results

⚙️ Environment Settings

💾 Required Data

🧪 Evaluation

📚 Training

📦 Submission

📖 Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages