Skip to content

jinhong-ni/UniPano

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?

Jinhong Ni1, Chang-Bin Zhang2, Qiang Zhang3,4, Jing Zhang1

1Australian National University 2The University of Hong Kong 3Beijing Innovation Center of Humanoid Robotics 4Hong Kong University of Science and Technology (Guangzhou)

Paper emal

Updates

  • [07/25] Our code is released.
  • [06/25] Our paper is accepted to ICCV 2025.

Key Findings

Our paper examines the key components that enable the adaptation of pre-trained Stable Diffusion for panorama generation. In particular, we summarize the two key findings of our paper:

  • The four attention matrices ($W_{{q,k,v,o}}$) behave differently when fine-tuned in isolation with LoRA. $W_q$ or $W_k$ fails to capture the spherical structure of the panoramas, whereas $W_v$ and $W_o$ succeed.

  • Jointly fine-tuned LoRA weights associated with the four attention matrices have different functionalities. (a) All four LoRAs together generate panoramic images; (b) naturally, the four LoRAs trained on panoramas lose the ability to generate perspective images; (c) excluding $W_v$ and $W_o$ LoRAs recovers the ability to generate perspective images; (d) excluding $W_q$ and $W_k$ LoRAs preserves the fine-tuned model's ability to generate panorams.

For more details, please refer to our paper.

Environment Setup

We use Anaconda to manage the environment. You can create the environment by running the following command:

cd UniPano
bash setup_env.sh

We use wandb to log and visualize the training process.

wandb login

Data Preparation

We follow PanFusion and MVDiffusion to download the Matterport3D skybox dataset. Please refer to their Data Preparation Section to download and prepare the dataset.

Training and Testing

For training UniPano with default settings, run the following command:

WANDB_NAME=unipano python main.py fit --data=Matterport3D --model=UniPano

Our training log can be found at wandb.

Please follow PanFusion to download the FAED checkpoint. Replace <WANDB_RUN_ID> with the wandb run ID and run the following command for testing:

WANDB_RUN_ID=<WANDB_RUN_ID> python main.py test --data=Matterport3D --model=UniPano --ckpt_path=last
WANDB_RUN_ID=<WANDB_RUN_ID> python main.py test --data=Matterport3D --model=EvalPanoGen

UniPano with Stable Diffusion 3

As mentioned in our paper, our uni-branch solution can be easily integrated into more advanced and memory-exhaustive diffusion models such as Stable Diffusion 3. We use a different codebase for Stable Diffusion 3. Please refer to UniPano_SD3 folder for more details.

Citation

@article{ni2025makes,
  title={What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?},
  author={Ni, Jinhong and Zhang, Chang-Bin and Zhang, Qiang and Zhang, Jing},
  journal={arXiv preprint arXiv:2505.22129},
  year={2025}
}

Acknowledgement

This repository is mainly developed based on PanFusion. The codebase also benefits from DiT-MoE for MoE implementation.

About

[ICCV 2025] Official implementation of "What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages