Skip to content

Low quality output (snow-like noise) when using Cache-DiT with Wan2.1-I2V #622

@TingxuanSix

Description

@TingxuanSix

Thank you for the amazing work on Cache-DiT! It’s a very promising acceleration framework for DiT-based models.

I am experiencing a quality issue when applying Cache-DiT to Wan2.1-I2V (14B-480P). The generated videos consist almost entirely of "snow". I have tested this with both 20 and 50 inference steps, but the issue persists regardless of the step count.

Steps to Reproduce
I am using the generate.py and registers.py from the examples/ folder.

MODEL_NAME="wan2.1_i2v"


torchrun --nproc_per_node=8 generate.py $MODEL_NAME \
--image-path "/home/cache-dit-main/examples/i2v_input.JPG" \
--model-path "/home/wan_weights/Wan2.1-I2V-14B-480P" \
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
--height 672 \
--width 480 \
--num-frames 49 \
--num_inference_steps 20 \ # Also tried 50, same issue
--parallel ulysses \
--ulysses-anything \
--save-path "./output_wan2.1_i2v.mp4" \
--attn flash \
--cache

The image I use is the example image in https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG

Environment
Hardware: 8 x NVIDIA H20 GPUs
torch: 2.6.0+cu126
diffusers: 0.36.0
flash_attn: 2.8.3
cache_dit: 1.1.9

Any guidance on how to fix this would be greatly appreciated!

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions