Thank you for the amazing work on Cache-DiT! It’s a very promising acceleration framework for DiT-based models.
I am experiencing a quality issue when applying Cache-DiT to Wan2.1-I2V (14B-480P). The generated videos consist almost entirely of "snow". I have tested this with both 20 and 50 inference steps, but the issue persists regardless of the step count.
Steps to Reproduce
I am using the generate.py and registers.py from the examples/ folder.
MODEL_NAME="wan2.1_i2v"
torchrun --nproc_per_node=8 generate.py $MODEL_NAME \
--image-path "/home/cache-dit-main/examples/i2v_input.JPG" \
--model-path "/home/wan_weights/Wan2.1-I2V-14B-480P" \
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
--height 672 \
--width 480 \
--num-frames 49 \
--num_inference_steps 20 \ # Also tried 50, same issue
--parallel ulysses \
--ulysses-anything \
--save-path "./output_wan2.1_i2v.mp4" \
--attn flash \
--cache
The image I use is the example image in https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG
Environment
Hardware: 8 x NVIDIA H20 GPUs
torch: 2.6.0+cu126
diffusers: 0.36.0
flash_attn: 2.8.3
cache_dit: 1.1.9
Any guidance on how to fix this would be greatly appreciated!
Thank you for the amazing work on Cache-DiT! It’s a very promising acceleration framework for DiT-based models.
I am experiencing a quality issue when applying Cache-DiT to Wan2.1-I2V (14B-480P). The generated videos consist almost entirely of "snow". I have tested this with both 20 and 50 inference steps, but the issue persists regardless of the step count.
Steps to Reproduce
I am using the generate.py and registers.py from the examples/ folder.
The image I use is the example image in https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG
Environment
Hardware: 8 x NVIDIA H20 GPUs
torch: 2.6.0+cu126
diffusers: 0.36.0
flash_attn: 2.8.3
cache_dit: 1.1.9
Any guidance on how to fix this would be greatly appreciated!