[Diffusion] Cache dit support parallel by BBuf · Pull Request #15163 · sgl-project/sglang

BBuf · 2025-12-15T07:38:58Z

main:

 sglang generate   --model-path /nas/bbuf/Wan2.2-T2V-A14B-Diffusers   --text-encoder-cpu-offload   --pin-cpu-memory   --num-gpus 4   --ulysses-degree 4 --attention-backend sage_attn  --prompt "A cat walks on the grass, realistic" --num-frames 81 --height 720 --width 1280

[12-15 05:42:55] [InputValidationStage] started...
[12-15 05:42:55] [InputValidationStage] finished in 0.0008 seconds
[12-15 05:42:55] [TextEncodingStage] started...
[12-15 05:42:58] [TextEncodingStage] finished in 2.9263 seconds
[12-15 05:42:58] [ConditioningStage] started...
[12-15 05:42:58] [ConditioningStage] finished in 0.0001 seconds
[12-15 05:42:58] [TimestepPreparationStage] started...
[12-15 05:42:58] [TimestepPreparationStage] finished in 0.0028 seconds
[12-15 05:42:58] [LatentPreparationStage] started...
[12-15 05:42:58] [LatentPreparationStage] finished in 0.0015 seconds
[12-15 05:42:58] [DenoisingStage] started...
100%|██████████████████████████████████████████████████████████████████████████████████████████| 40/40 [05:33<00:00,  8.33s/it]
[12-15 05:48:31] [DenoisingStage] average time per step: 8.3287 seconds
[12-15 05:48:31] [DenoisingStage] finished in 333.6282 seconds
[12-15 05:48:31] [DecodingStage] started...
[12-15 05:48:42] [DecodingStage] finished in 11.0321 seconds
[12-15 05:48:52] Saved output to outputs/A_cat_walks_on_the_grass_realistic_20251215-054255_c162f999.mp4
[12-15 05:48:52] Pixel data generated successfully in 356.66 seconds
[12-15 05:48:52] Completed batch processing. Generated 1 outputs in 356.66 seconds.

A_cat_walks_on_the_grass_realistic_20251215-054255_c162f999.mp4

pr:


SGLANG_CACHE_DIT_ENABLED=true \
SGLANG_CACHE_DIT_WARMUP=4 \
SGLANG_CACHE_DIT_MC=8 \
SGLANG_CACHE_DIT_RDT=0.24 \
SGLANG_CACHE_DIT_FN=1 \
SGLANG_CACHE_DIT_BN=0 \
SGLANG_CACHE_DIT_TAYLORSEER=true \
SGLANG_CACHE_DIT_SECONDARY_WARMUP=2 \
SGLANG_CACHE_DIT_SECONDARY_MC=20 \
SGLANG_CACHE_DIT_SECONDARY_RDT=0.24 \
SGLANG_CACHE_DIT_SECONDARY_FN=1 \
SGLANG_CACHE_DIT_SECONDARY_BN=0 \
SGLANG_CACHE_DIT_SECONDARY_TAYLORSEER=true \
SGLANG_CACHE_DIT_SCM_PRESET=fast \
SGLANG_CACHE_DIT_SCM_POLICY=dynamic  sglang generate   --model-path /nas/bbuf/Wan2.2-T2V-A14B-Diffusers   --text-encoder-cpu-offload   --pin-cpu-memory   --num-gpus 4   --ulysses-degree 4 --attention-backend sage_attn  --prompt "A cat walks on the grass, realistic" --num-frames 81 --height 720 --width 1280

[12-15 07:29:22] [InputValidationStage] started...
[12-15 07:29:22] [InputValidationStage] finished in 0.0008 seconds
[12-15 07:29:22] [TextEncodingStage] started...
WARNING 12-15 07:29:24 [block_adapters.py:131] pipe is None, use FakeDiffusionPipeline instead.
INFO 12-15 07:29:24 [block_adapters.py:229] Auto fill blocks_name: ['blocks', 'blocks'].
INFO 12-15 07:29:24 [block_adapters.py:162] Found transformer NOT from diffusers: sglang.multimodal_gen.runtime.models.dits.wanvideo disable check_forward_pattern by default.
INFO 12-15 07:29:24 [cache_interface.py:200] cache_config is None, using default DBCacheConfig
INFO 12-15 07:29:24 [cache_adapter.py:77] Adapting Cache Acceleration using custom BlockAdapter!
WARNING 12-15 07:29:24 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
WARNING 12-15 07:29:24 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:24 [cache_adapter.py:134] Use custom 'enable_separate_cfg' from BlockAdapter: True. Pipeline: FakeDiffusionPipeline.
INFO 12-15 07:29:24 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W4I1M0MC8_R0.24_SCM1111111111001100000110000001100000100001_dynamic, Calibrator Config: TaylorSeer_O(1)
INFO 12-15 07:29:24 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W2I1M0MC20_R0.24_SCM1111111111001100000110000001100000100001_dynamic, Calibrator Config: TaylorSeer_O(1)
WARNING 12-15 07:29:24 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:24 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140123378760208, context_manager: FakeDiffusionPipeline_140121768562800.
WARNING 12-15 07:29:24 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:24 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140121769868752, context_manager: FakeDiffusionPipeline_140121768562800.
[12-15 07:29:25] [TextEncodingStage] finished in 3.0674 seconds
[12-15 07:29:25] [ConditioningStage] started...
[12-15 07:29:25] [ConditioningStage] finished in 0.0002 seconds
[12-15 07:29:25] [TimestepPreparationStage] started...
[12-15 07:29:25] [TimestepPreparationStage] finished in 0.0040 seconds
[12-15 07:29:25] [LatentPreparationStage] started...
[12-15 07:29:25] [LatentPreparationStage] finished in 0.0022 seconds
[12-15 07:29:25] [DenoisingStage] started...
[12-15 07:29:25] cache-dit enabled in distributed environment (world_size=4, sp_group=True, tp_group=False)
[12-15 07:29:25] SCM: generated mask with 18 compute steps, 22 cache steps (preset=fast)
[12-15 07:29:25] Enabling cache-dit on wan2.2 dual transformers with BlockAdapter
[12-15 07:29:25]   Primary (transformer): Fn=1, Bn=0, W=4, R=0.24, MC=8, TaylorSeer=True
[12-15 07:29:25]   Secondary (transformer_2): Fn=1, Bn=0, W=2, R=0.24, MC=20, TaylorSeer=True
[12-15 07:29:25]   SCM enabled: 18 compute steps, 22 cache steps, policy=dynamic
WARNING 12-15 07:29:25 [block_adapters.py:131] pipe is None, use FakeDiffusionPipeline instead.
INFO 12-15 07:29:25 [block_adapters.py:229] Auto fill blocks_name: ['blocks', 'blocks'].
INFO 12-15 07:29:25 [block_adapters.py:162] Found transformer NOT from diffusers: sglang.multimodal_gen.runtime.models.dits.wanvideo disable check_forward_pattern by default.
INFO 12-15 07:29:25 [cache_interface.py:200] cache_config is None, using default DBCacheConfig
INFO 12-15 07:29:25 [cache_adapter.py:77] Adapting Cache Acceleration using custom BlockAdapter!
WARNING 12-15 07:29:25 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
WARNING 12-15 07:29:25 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:25 [cache_adapter.py:134] Use custom 'enable_separate_cfg' from BlockAdapter: True. Pipeline: FakeDiffusionPipeline.
INFO 12-15 07:29:25 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W4I1M0MC8_R0.24_SCM1111111111001100000110000001100000100001_dynamic, Calibrator Config: TaylorSeer_O(1)
INFO 12-15 07:29:25 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W2I1M0MC20_R0.24_SCM1111111111001100000110000001100000100001_dynamic, Calibrator Config: TaylorSeer_O(1)
WARNING 12-15 07:29:25 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:25 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140089688802704, context_manager: FakeDiffusionPipeline_140089116797520.
WARNING 12-15 07:29:25 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:25 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140089119995232, context_manager: FakeDiffusionPipeline_140089116797520.
[12-15 07:29:25] cache-dit enabled on dual transformers (steps=40)
  0%|                                                                                                   | 0/40 [00:00<?, ?it/s]WARNING 12-15 07:29:25 [block_adapters.py:131] pipe is None, use FakeDiffusionPipeline instead.
INFO 12-15 07:29:25 [block_adapters.py:229] Auto fill blocks_name: ['blocks', 'blocks'].
INFO 12-15 07:29:25 [block_adapters.py:162] Found transformer NOT from diffusers: sglang.multimodal_gen.runtime.models.dits.wanvideo disable check_forward_pattern by default.
INFO 12-15 07:29:25 [cache_interface.py:200] cache_config is None, using default DBCacheConfig
INFO 12-15 07:29:25 [cache_adapter.py:77] Adapting Cache Acceleration using custom BlockAdapter!
WARNING 12-15 07:29:25 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
WARNING 12-15 07:29:25 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:25 [cache_adapter.py:134] Use custom 'enable_separate_cfg' from BlockAdapter: True. Pipeline: FakeDiffusionPipeline.
INFO 12-15 07:29:25 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W4I1M0MC8_R0.24_SCM1111111111001100000110000001100000100001_dynamic, Calibrator Config: TaylorSeer_O(1)
INFO 12-15 07:29:25 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W2I1M0MC20_R0.24_SCM1111111111001100000110000001100000100001_dynamic, Calibrator Config: TaylorSeer_O(1)
WARNING 12-15 07:29:25 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:25 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140036139129056, context_manager: FakeDiffusionPipeline_140036140830448.
WARNING 12-15 07:29:25 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:25 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140034260318160, context_manager: FakeDiffusionPipeline_140036140830448.
WARNING 12-15 07:29:25 [block_adapters.py:131] pipe is None, use FakeDiffusionPipeline instead.
INFO 12-15 07:29:25 [block_adapters.py:229] Auto fill blocks_name: ['blocks', 'blocks'].
INFO 12-15 07:29:25 [block_adapters.py:162] Found transformer NOT from diffusers: sglang.multimodal_gen.runtime.models.dits.wanvideo disable check_forward_pattern by default.
INFO 12-15 07:29:25 [cache_interface.py:200] cache_config is None, using default DBCacheConfig
INFO 12-15 07:29:25 [cache_adapter.py:77] Adapting Cache Acceleration using custom BlockAdapter!
WARNING 12-15 07:29:25 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
WARNING 12-15 07:29:25 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:25 [cache_adapter.py:134] Use custom 'enable_separate_cfg' from BlockAdapter: True. Pipeline: FakeDiffusionPipeline.
INFO 12-15 07:29:25 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W4I1M0MC8_R0.24_SCM1111111111001100000110000001100000100001_dynamic, Calibrator Config: TaylorSeer_O(1)
INFO 12-15 07:29:25 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W2I1M0MC20_R0.24_SCM1111111111001100000110000001100000100001_dynamic, Calibrator Config: TaylorSeer_O(1)
WARNING 12-15 07:29:25 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:25 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140010297562928, context_manager: FakeDiffusionPipeline_140009561783040.
WARNING 12-15 07:29:25 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-15 07:29:25 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140009561774112, context_manager: FakeDiffusionPipeline_140009561783040.
100%|██████████████████████████████████████████████████████████████████████████████████████████| 40/40 [04:03<00:00,  6.09s/it]
[12-15 07:33:28] [DenoisingStage] average time per step: 6.0888 seconds
[12-15 07:33:29] [DenoisingStage] finished in 244.2733 seconds
[12-15 07:33:29] [DecodingStage] started...
[12-15 07:33:39] [DecodingStage] finished in 10.0757 seconds
[12-15 07:33:48] Saved output to outputs/A_cat_walks_on_the_grass_realistic_20251215-072921_c162f999.mp4

A_cat_walks_on_the_grass_realistic_20251215-072921_c162f999.mp4

333.6s->244.27s (36%+)

gemini-code-assist · 2025-12-15T07:39:02Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

BBuf · 2025-12-15T07:43:29Z

cc @DefTruth

mickqian · 2025-12-15T07:46:46Z

+    if _original_similarity is None:
+        _original_similarity = cache_manager.CachedContextManager.similarity
+
+    def patched_similarity(self, t1, t2, *, threshold, parallelized=False, prefix="Fn"):


does this similarity applies to all cache algorithms?

Add a code apdapted link: # Adapted from https://github.com/vipshop/cache-dit/blob/main/src/cache_dit/caching/cache_contexts/cache_manager.py#L495-L523

@DefTruth can you confirm it?

does this similarity applies to all cache algorithms?

@BBuf @mickqian Sure! The similarity func applies to all cache algorithms. The implementation in cache-dit does not take into account hybrid parallelism now. I think this patch is a suitable modification. If hybrid parallelism is used, it would be more appropriate for us to perform diff all-reduce in the specified group.

mickqian · 2025-12-15T08:47:10Z

/tag-and-rerun-ci

DefTruth · 2025-12-15T09:20:41Z

@mickqian @BBuf

I think the HACK scheme is as follows:

-> patch the similarity function (to support hybrid parallelism and pass a specific group to the cache manager)
-> build a FAKE cache-dit parallelism configuration to indicate that we are running in a distributed scenario (DO NOT pass it to cache-dit via the `parallelism_config` parameter in the `enable_cache` API)
-> bind the SP group and TP group from SGLang to the cache_manager 
-> the patched similarity function uses the specific SP group and TP group

BBuf · 2025-12-15T11:15:53Z

@mickqian @BBuf

I think the HACK scheme is as follows:

-> patch the similarity function (to support hybrid parallelism and pass a specific group to the cache manager)
-> build a FAKE cache-dit parallelism configuration to indicate that we are running in a distributed scenario (DO NOT pass it to cache-dit via the `parallelism_config` parameter in the `enable_cache` API)
-> bind the SP group and TP group from SGLang to the cache_manager 
-> the patched similarity function uses the specific SP group and TP group

Agree with it.

…n_eagle3_npu * 'main' of https://github.com/sgl-project/sglang: (89 commits) [model-gateway] Remove legacy RouterMetrics and Rename SmgMetrics to Metrics and smg_labels to metrics_labels (sgl-project#15160) [diffusion] fix: fix video model sp when resolution is not specified (sgl-project#15047) [diffusion] fix: fix pytorch non-writable array warning (sgl-project#15017) [diffusion] fix: cache dit with parallel (sgl-project#15163) chore: change npu pr-test a2 runner (sgl-project#15152) [Feature] Fuse mrope all in 1 kernel (sgl-project#14906) Fix num running requests (load) wrong cleared for ongoing requests (sgl-project#15116) Fused two elementwise kernels for k_nope and k_pe concat (sgl-project#14862) fix: adding date and fixing release name issue (sgl-project#15174) [CPU] Add Gemma3RMSNorm kernel in sgl-kernel and add ut (sgl-project#9324) feature: PR wheel (sgl-project#15170) [diffusion] model: support mutli-image input and qwen-image-edit-2509 (sgl-project#15005) fix CompressedTensorsW8A8Int8 min_capability (sgl-project#13914) Tiny improve summary text in `bench_one_batch_server.py` (sgl-project#15158) [model-gateway] add mcp and discovery metrics (sgl-project#15156) fix: move ci-bot (sgl-project#15154) Fix import warnings (sgl-project#15144) ci: adding errors to Github summary (sgl-project#14778) [model-gateway] Add streaming metrics for harmony gRPC router (sgl-project#15147) [model-gateway] upgrade axum and axum server (sgl-project#15146) ... # Conflicts: # python/sglang/srt/server_args.py

Co-authored-by: Mick <mickjagger19@icloud.com>

BBuf added 8 commits December 15, 2025 13:25

upd

6ce7017

upd

8c090d0

upd

6dfd199

upd

5457324

upd

824852b

upd

2802889

upd

63f7310

upd

bfa43b4

BBuf requested review from mickqian and yhyang201 as code owners December 15, 2025 07:38

github-actions Bot added the diffusion SGLang Diffusion label Dec 15, 2025

BBuf changed the title ~~Cache dit support parallel~~ [Diffusion] Cache dit support parallel Dec 15, 2025

mickqian reviewed Dec 15, 2025

View reviewed changes

Comment thread python/sglang/multimodal_gen/runtime/utils/cache_dit_integration.py

BBuf and others added 4 commits December 15, 2025 15:53

upd

612b0bb

upd

1027602

upd

0caa899

idempotent

6f981ac

github-actions Bot added the run-ci label Dec 15, 2025

mickqian merged commit 92c29d4 into main Dec 15, 2025
108 of 113 checks passed

mickqian deleted the cache_dit_support_parallel branch December 15, 2025 11:15

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 17, 2025

[diffusion] fix: cache dit with parallel (sgl-project#15163)

79a7c1f

Co-authored-by: Mick <mickjagger19@icloud.com>

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[diffusion] fix: cache dit with parallel (sgl-project#15163)

673079d

Co-authored-by: Mick <mickjagger19@icloud.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Diffusion] Cache dit support parallel#15163

[Diffusion] Cache dit support parallel#15163
mickqian merged 12 commits intomainfrom
cache_dit_support_parallel

BBuf commented Dec 15, 2025

Uh oh!

gemini-code-assist Bot commented Dec 15, 2025

Uh oh!

BBuf commented Dec 15, 2025

Uh oh!

Uh oh!

mickqian Dec 15, 2025

Uh oh!

BBuf Dec 15, 2025

Uh oh!

BBuf Dec 15, 2025

Uh oh!

DefTruth Dec 15, 2025

Uh oh!

Uh oh!

mickqian commented Dec 15, 2025

Uh oh!

DefTruth commented Dec 15, 2025

Uh oh!

Uh oh!

BBuf commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BBuf commented Dec 15, 2025

Uh oh!

gemini-code-assist Bot commented Dec 15, 2025

Uh oh!

BBuf commented Dec 15, 2025

Uh oh!

Uh oh!

mickqian Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

BBuf Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

BBuf Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

DefTruth Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mickqian commented Dec 15, 2025

Uh oh!

DefTruth commented Dec 15, 2025

Uh oh!

Uh oh!

BBuf commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants