Jinhong Ni1, Chang-Bin Zhang2, Qiang Zhang3,4, Jing Zhang1
1Australian National University 2The University of Hong Kong 3Beijing Innovation Center of Humanoid Robotics 4Hong Kong University of Science and Technology (Guangzhou)
- [07/25] Our code is released.
- [06/25] Our paper is accepted to ICCV 2025.
Our paper examines the key components that enable the adaptation of pre-trained Stable Diffusion for panorama generation. In particular, we summarize the two key findings of our paper:
- The four attention matrices (
$W_{{q,k,v,o}}$ ) behave differently when fine-tuned in isolation with LoRA.$W_q$ or$W_k$ fails to capture the spherical structure of the panoramas, whereas$W_v$ and$W_o$ succeed.
- Jointly fine-tuned LoRA weights associated with the four attention matrices have different functionalities. (a) All four LoRAs together generate panoramic images; (b) naturally, the four LoRAs trained on panoramas lose the ability to generate perspective images; (c) excluding
$W_v$ and$W_o$ LoRAs recovers the ability to generate perspective images; (d) excluding$W_q$ and$W_k$ LoRAs preserves the fine-tuned model's ability to generate panorams.
For more details, please refer to our paper.
We use Anaconda to manage the environment. You can create the environment by running the following command:
cd UniPano
bash setup_env.shWe use wandb to log and visualize the training process.
wandb loginWe follow PanFusion and MVDiffusion to download the Matterport3D skybox dataset. Please refer to their Data Preparation Section to download and prepare the dataset.
For training UniPano with default settings, run the following command:
WANDB_NAME=unipano python main.py fit --data=Matterport3D --model=UniPanoOur training log can be found at wandb.
Please follow PanFusion to download the FAED checkpoint. Replace <WANDB_RUN_ID> with the wandb run ID and run the following command for testing:
WANDB_RUN_ID=<WANDB_RUN_ID> python main.py test --data=Matterport3D --model=UniPano --ckpt_path=last
WANDB_RUN_ID=<WANDB_RUN_ID> python main.py test --data=Matterport3D --model=EvalPanoGenAs mentioned in our paper, our uni-branch solution can be easily integrated into more advanced and memory-exhaustive diffusion models such as Stable Diffusion 3. We use a different codebase for Stable Diffusion 3. Please refer to UniPano_SD3 folder for more details.
@article{ni2025makes,
title={What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?},
author={Ni, Jinhong and Zhang, Chang-Bin and Zhang, Qiang and Zhang, Jing},
journal={arXiv preprint arXiv:2505.22129},
year={2025}
}
This repository is mainly developed based on PanFusion. The codebase also benefits from DiT-MoE for MoE implementation.

