ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Method Overview

Compared to prior works that rely on implicit alignment or coarse video-level routing, ProPhy introduces a progressive alignment framework. It injects learnable physical priors and employs fine-grained token-level routing, allowing specialized experts to internalize specific physical domains and improve the physical realism of generated videos.

The core of ProPhy consists of two key components:

Semantic Expert Block: Captures high-level physical categories and initial semantic alignment.
Refinement Expert Block: Performs fine-grained refinement to ensure precise physical dynamics.

During inference, ProPhy operates end-to-end, dynamically aligning physics categories through these blocks to produce physically-consistent video content.

Installation

Python: 3.10+ recommended.

Environment Setup (using uv):

uv sync
source .venv/bin/activate   # or .venv\Scripts\activate on Windows

All commands below are run from the repository root.

Generate Attention Map

tools/generate_attention_map.py produces per-video attention maps for physical phenomena and appearance using a Qwen2.5-VL–based model.

Run from the repository root:

export PYTHONPATH=$(pwd):$PYTHONPATH
python3 tools/generate_attention_map.py \
  --data_json_path /path/to/dataset.json \
  --video_base_path /path/to/videos \
  --output_dir /path/to/attention_output \
  --model_path /path/to/Qwen2.5-VL-checkpoint

For the JSON file passed to --data_json_path, each item needs:

video_name: the filename of the video (e.g., video_001.mp4), which will be joined with the --video_base_path argument.
activate_expert: a list that supports both:
- integers: built-in phenomenon / appearance IDs defined in configs/attention_map.py
- strings: your own physical attributes

Example with built-in IDs:

[
  {
    "video_name": "video_001.mp4",
    "activate_expert": [0, 3]
  }
]

This will generate attention maps for expert IDs 0 and 3.

You can also mix IDs with custom strings:

[
  {
    "video_name": "video_001.mp4",
    "activate_expert": [0, "surface tension", "magnetic attraction"]
  }
]

This will generate the default describe map, the built-in map for ID 0, and extra maps for surface tension and magnetic attraction.

Inference

Pretrained backbone checkpoints are available on Hugging Face: CogVideoX and Wan. Our ProPhy checkpoints will be released soon!

CogVideoX

export PYTHONPATH=$(pwd):$PYTHONPATH
python3 inference_cogvideox.py \
  --pretrained_checkpoint /path/to/CogVideoX-5b \
  --prophy_checkpoint /path/to/checkpoint \
  --prompt "Your prompt" \
  --output_path /path/to/output.mp4

Wan

export PYTHONPATH=$(pwd):$PYTHONPATH
python3 inference_wan.py \
  --pretrained_checkpoint /path/to/Wan2.1-T2V-1.3B-Diffusers \
  --prophy_checkpoint /path/to/checkpoint \
  --prompt "Your prompt" \
  --output_path /path/to/output.mp4

output_path can be a .mp4 file or a directory (in which case a default filename is used).

Acknowledgements

We would like to thank the following projects for their contributions:

Wan2.1 and CogVideoX for their excellent backbone models.
WISA for providing their high-quality dataset.

Citation

If you use ProPhy in your work, please cite:

@misc{wang2025prophyprogressivephysicalalignment,
      title={ProPhy: Progressive Physical Alignment for Dynamic World Simulation}, 
      author={Zijun Wang and Panwen Hu and Jing Wang and Terry Jingchen Zhang and Yuhao Cheng and Long Chen and Yiqiang Yan and Zutao Jiang and Hanhui Li and Xiaodan Liang},
      year={2025},
      eprint={2512.05564},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.05564}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
configs		configs
prophy		prophy
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference_cogvideox.py		inference_cogvideox.py
inference_wan.py		inference_wan.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Method Overview

Installation

Generate Attention Map

Inference

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Method Overview

Installation

Generate Attention Map

Inference

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages