Skip to content

Gynjn/MVP

Repository files navigation

Multi-view Pyramid Transformer: Look Coarser to See Broader

arXiv Project Page

Gyeongjin Kang, Seungkwon Yang, Seungtae Nam, Younggeun Lee, Jungwoo Kim, Eunbyung Park

Official repo for the paper "Multi-view Pyramid Transformer: Look Coarser to See Broader"

News

[Mar 2026] MVP optionally supports Flash Attention 4 for faster attention computation. If flash-attn-4 is installed, it will be used automatically; otherwise the code falls back to the standard F.scaled_dot_product_attention.

Views H100 + FA3 (s) B200 + FA4 (s) Speedup
16 0.09 0.05 1.8×
32 0.17 0.10 1.7×
64 0.36 0.20 1.8×
128 0.77 0.43 1.8×
192 1.23 0.70 1.8×
256 1.84 1.08 1.7×

Reconstruction time (seconds) at 960x540. H100 numbers from the original paper.

[Mar 2026] We've updated the codebase with a CUDA implementation of Opacity with SH coefficients, reducing both training time and memory consumption. Kudos to Hyeongbhin-Cho for the contribution. See details in the original repo.

[Mar 2026] We've released 2Xplat, a pose-free version of MVP built on a two-expert design — one for camera pose estimation, one for 3DGS generation. It outperforms prior pose-free methods and matches state-of-the-art posed approaches in under 5K training iterations.

Installation

# 1. Clone the repository
# If starting fresh (clone everything at once):
git clone --recurse-submodules https://github.com/Gynjn/MVP.git

# If already cloned (initialize submodules):
git submodule update --init --recursive

# 2. Create and activate conda environment
conda create -n mvp python=3.11 -y
conda activate mvp

# 3. Install dependencies (adjust CUDA version in requirements.txt to match your system)
pip install -r requirements.txt

# 4. Install CUDA kernels
cd rendering_cuda
pip install . --no-build-isolation
cd ../sh_cuda
pip install . --no-build-isolation

# 5. Optional
pip install flash-attn-4

Checkpoints

The model checkpoints are host on HuggingFace (mvp_960x540).

For training and evaluation, we used the DL3DV dataset after applying undistortion preprocessing with this script, originally introduced in Long-LRM.

Download the DL3DV benchmark dataset from here, and apply undistortion preprocessing.

For benchmark data, we provide preprocessed version originally sourced from the RayZer repository. You can find the preprocessed data here. Thanks to Hanwen Jiang for sharing the preprocessed data.

Inference

Update the inference.ckpt_path field in configs/inference.yaml with the pretrained model.

Update the entries in data/dl3dv_benchmark.txt to point to the correct processed dataset path.

# inference
CUDA_VISIBLE_DEVICES=0 python inference.py --config configs/inference.yaml

Train

Update the configs/api_keys.yaml with your own personal wandb api key.

Update the entries in data/dl3dv_train.txt to point to the correct processed dataset path.

If you have enough GPU memory, disable gradient checkpointing in each stage function run_stage1, run_stage2, and run_stage3 in model/mvp.py.

# Example for single GPU training
CUDA_VISIBLE_DEVICES=0 python train_single.py --config configs/train_stage1.yaml

# Example for multi GPU training
torchrun --nproc_per_node 8 --nnodes 1 \
         --rdzv_id 1234 --rdzv_endpoint localhost:8888 \
         train.py --config configs/train_stage1.yaml

TODO List

  • Preprocessed Tanks&Temple and Mip-NeRF360 dataset

Citation

@article{kang2025multi,
  title={Multi-view Pyramid Transformer: Look Coarser to See Broader},
  author={Kang, Gyeongjin and Yang, Seungkwon and Nam, Seungtae and Lee, Younggeun and Kim, Jungwoo and Park, Eunbyung},
  journal={arXiv preprint arXiv:2512.07806},
  year={2025}
}

Related project

This project is built on many amazing research works, thanks a lot to all the authors for sharing!

ACKNOWLEDGEMENTS

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (RS-2025-02653113, High-Performance Research AI Computing Infrastructure Support at the 2 PFLOPS Scale)

About

[CVPR 2026] Multi-view Pyramid Transformer: Look Coarser to See Broader

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors