Skip to content

zijunwa/ProPhy

Repository files navigation

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

arXiv Project Homepage

Method Overview

Compared to prior works that rely on implicit alignment or coarse video-level routing, ProPhy introduces a progressive alignment framework. It injects learnable physical priors and employs fine-grained token-level routing, allowing specialized experts to internalize specific physical domains and improve the physical realism of generated videos.

The core of ProPhy consists of two key components:

  1. Semantic Expert Block: Captures high-level physical categories and initial semantic alignment.
  2. Refinement Expert Block: Performs fine-grained refinement to ensure precise physical dynamics.

During inference, ProPhy operates end-to-end, dynamically aligning physics categories through these blocks to produce physically-consistent video content.

Installation

  • Python: 3.10+ recommended.

  • Environment Setup (using uv):

    uv sync
    source .venv/bin/activate   # or .venv\Scripts\activate on Windows
  • All commands below are run from the repository root.

Generate Attention Map

tools/generate_attention_map.py produces per-video attention maps for physical phenomena and appearance using a Qwen2.5-VL–based model.

Run from the repository root:

export PYTHONPATH=$(pwd):$PYTHONPATH
python3 tools/generate_attention_map.py \
  --data_json_path /path/to/dataset.json \
  --video_base_path /path/to/videos \
  --output_dir /path/to/attention_output \
  --model_path /path/to/Qwen2.5-VL-checkpoint

For the JSON file passed to --data_json_path, each item needs:

  • video_name: the filename of the video (e.g., video_001.mp4), which will be joined with the --video_base_path argument.
  • activate_expert: a list that supports both:
    • integers: built-in phenomenon / appearance IDs defined in configs/attention_map.py
    • strings: your own physical attributes

Example with built-in IDs:

[
  {
    "video_name": "video_001.mp4",
    "activate_expert": [0, 3]
  }
]

This will generate attention maps for expert IDs 0 and 3.

You can also mix IDs with custom strings:

[
  {
    "video_name": "video_001.mp4",
    "activate_expert": [0, "surface tension", "magnetic attraction"]
  }
]

This will generate the default describe map, the built-in map for ID 0, and extra maps for surface tension and magnetic attraction.

Inference

Pretrained backbone checkpoints are available on Hugging Face: CogVideoX and Wan. Our ProPhy checkpoints will be released soon!

  • CogVideoX

    export PYTHONPATH=$(pwd):$PYTHONPATH
    python3 inference_cogvideox.py \
      --pretrained_checkpoint /path/to/CogVideoX-5b \
      --prophy_checkpoint /path/to/checkpoint \
      --prompt "Your prompt" \
      --output_path /path/to/output.mp4
  • Wan

    export PYTHONPATH=$(pwd):$PYTHONPATH
    python3 inference_wan.py \
      --pretrained_checkpoint /path/to/Wan2.1-T2V-1.3B-Diffusers \
      --prophy_checkpoint /path/to/checkpoint \
      --prompt "Your prompt" \
      --output_path /path/to/output.mp4

output_path can be a .mp4 file or a directory (in which case a default filename is used).

Acknowledgements

We would like to thank the following projects for their contributions:

  • Wan2.1 and CogVideoX for their excellent backbone models.
  • WISA for providing their high-quality dataset.

Citation

If you use ProPhy in your work, please cite:

@misc{wang2025prophyprogressivephysicalalignment,
      title={ProPhy: Progressive Physical Alignment for Dynamic World Simulation}, 
      author={Zijun Wang and Panwen Hu and Jing Wang and Terry Jingchen Zhang and Yuhao Cheng and Long Chen and Yiqiang Yan and Zutao Jiang and Hanhui Li and Xiaodan Liang},
      year={2025},
      eprint={2512.05564},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.05564}, 
}

About

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages