Multi-view Pyramid Transformer: Look Coarser to See Broader

Gyeongjin Kang, Seungkwon Yang, Seungtae Nam, Younggeun Lee, Jungwoo Kim, Eunbyung Park

Official repo for the paper "Multi-view Pyramid Transformer: Look Coarser to See Broader"

News

[Mar 2026] MVP optionally supports Flash Attention 4 for faster attention computation. If flash-attn-4 is installed, it will be used automatically; otherwise the code falls back to the standard F.scaled_dot_product_attention.

Views	H100 + FA3 (s)	B200 + FA4 (s)	Speedup
16	0.09	0.05	1.8×
32	0.17	0.10	1.7×
64	0.36	0.20	1.8×
128	0.77	0.43	1.8×
192	1.23	0.70	1.8×
256	1.84	1.08	1.7×

Reconstruction time (seconds) at 960x540. H100 numbers from the original paper.

[Mar 2026] We've updated the codebase with a CUDA implementation of Opacity with SH coefficients, reducing both training time and memory consumption. Kudos to Hyeongbhin-Cho for the contribution. See details in the original repo.

[Mar 2026] We've released 2Xplat, a pose-free version of MVP built on a two-expert design — one for camera pose estimation, one for 3DGS generation. It outperforms prior pose-free methods and matches state-of-the-art posed approaches in under 5K training iterations.

Installation

# 1. Clone the repository
# If starting fresh (clone everything at once):
git clone --recurse-submodules https://github.com/Gynjn/MVP.git

# If already cloned (initialize submodules):
git submodule update --init --recursive

# 2. Create and activate conda environment
conda create -n mvp python=3.11 -y
conda activate mvp

# 3. Install dependencies (adjust CUDA version in requirements.txt to match your system)
pip install -r requirements.txt

# 4. Install CUDA kernels
cd rendering_cuda
pip install . --no-build-isolation
cd ../sh_cuda
pip install . --no-build-isolation

# 5. Optional
pip install flash-attn-4

Checkpoints

The model checkpoints are host on HuggingFace (mvp_960x540).

For training and evaluation, we used the DL3DV dataset after applying undistortion preprocessing with this script, originally introduced in Long-LRM.

Download the DL3DV benchmark dataset from here, and apply undistortion preprocessing.

For benchmark data, we provide preprocessed version originally sourced from the RayZer repository. You can find the preprocessed data here. Thanks to Hanwen Jiang for sharing the preprocessed data.

Inference

Update the inference.ckpt_path field in configs/inference.yaml with the pretrained model.

Update the entries in data/dl3dv_benchmark.txt to point to the correct processed dataset path.

# inference
CUDA_VISIBLE_DEVICES=0 python inference.py --config configs/inference.yaml

Train

Update the configs/api_keys.yaml with your own personal wandb api key.

Update the entries in data/dl3dv_train.txt to point to the correct processed dataset path.

If you have enough GPU memory, disable gradient checkpointing in each stage function run_stage1, run_stage2, and run_stage3 in model/mvp.py.

# Example for single GPU training
CUDA_VISIBLE_DEVICES=0 python train_single.py --config configs/train_stage1.yaml

# Example for multi GPU training
torchrun --nproc_per_node 8 --nnodes 1 \
         --rdzv_id 1234 --rdzv_endpoint localhost:8888 \
         train.py --config configs/train_stage1.yaml

TODO List

Preprocessed Tanks&Temple and Mip-NeRF360 dataset

Citation

@article{kang2025multi,
  title={Multi-view Pyramid Transformer: Look Coarser to See Broader},
  author={Kang, Gyeongjin and Yang, Seungkwon and Nam, Seungtae and Lee, Younggeun and Kim, Jungwoo and Park, Eunbyung},
  journal={arXiv preprint arXiv:2512.07806},
  year={2025}
}

Related project

This project is built on many amazing research works, thanks a lot to all the authors for sharing!

ACKNOWLEDGEMENTS

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (RS-2025-02653113, High-Performance Research AI Computing Infrastructure Support at the 2 PFLOPS Scale)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
configs		configs
data		data
model		model
rendering_cuda @ f4c0fab		rendering_cuda @ f4c0fab
sh_cuda		sh_cuda
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
inference.py		inference.py
metric_utils.py		metric_utils.py
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py
train_single.py		train_single.py
train_stage3.py		train_stage3.py
training_utils.py		training_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-view Pyramid Transformer: Look Coarser to See Broader

News

Installation

Checkpoints

Inference

Train

TODO List

Citation

Related project

ACKNOWLEDGEMENTS

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-view Pyramid Transformer: Look Coarser to See Broader

News

Installation

Checkpoints

Inference

Train

TODO List

Citation

Related project

ACKNOWLEDGEMENTS

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages