[Bug] Diffusion profiler wan2.2 bug

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [x] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [x] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Describe the bug

```shell
sglang generate   --model-path /nas/bbuf/Wan2.2-T2V-A14B-Diffusers   --text-encoder-cpu-offload   --pin-cpu-memory   --num-gpus 4   --ulysses-degree 4 --profile --prompt "A young woman with long blonde hair, wearing a white t-shirt and blue jeans, walking through a sunny park with green trees in the background, carrying a brown shoulder bag, followed by a smooth tracking shot that moves alongside her as she walks."
```

```
```shell
Loading required modules: 100%|████████████████████████████████████████████████████████████████████████████████| 6/6 [03:10<00:00, 31.74s/it]
[12-13 07:41:25] Pipelines instantiated
[12-13 07:41:25] Worker 0: Initialized device, model, and distributed environment.
[12-13 07:41:25] Worker 0: Scheduler loop started.
[12-13 07:41:25] Rank 0 scheduler listening on tcp://*:5638
Loading required modules: 100%|████████████████████████████████████████████████████████████████████████████████| 6/6 [03:10<00:00, 31.77s/it]
Loading required modules: 100%|████████████████████████████████████████████████████████████████████████████████| 6/6 [03:11<00:00, 31.84s/it]
Loading required modules: 100%|████████████████████████████████████████████████████████████████████████████████| 6/6 [03:12<00:00, 32.07s/it]
[12-13 07:41:27] Sampling params:
                       width: -1
                      height: -1
                  num_frames: 125
                      prompt: A young woman with long blonde hair, wearing a white t-shirt and blue jeans, walking through a sunny park with green trees in the background, carrying a brown shoulder bag, followed by a smooth tracking shot that moves alongside her as she walks.
                  neg_prompt: 色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走
                        seed: 1024
                 infer_steps: 40
      num_outputs_per_prompt: 1
              guidance_scale: 4.0
     embedded_guidance_scale: 6.0
                    n_tokens: -1
                  flow_shift: 12.0
                  image_path: None
                 save_output: True
            output_file_path: outputs/A_young_woman_with_long_blonde_hair_wearing_a_white_t-shirt_and_blue_jeans_walking_through_a_sunny_20251213-074127_4269cbf7.mp4
        
[12-13 07:41:27] Processing prompt 1/1: A young woman with long blonde hair, wearing a white t-shirt and blue jeans, walking through a sunny
/nas/bbuf/sglang/python/sglang/multimodal_gen/runtime/utils/distributed.py:34: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:206.)
  tensor_data = torch.ByteTensor(
[12-13 07:41:27] Creating pipeline stages...
[12-13 07:41:27] Using FlashAttention (FA3 for hopper, FA4 for blackwell) backend
[12-13 07:41:27] Running pipeline stages: ['input_validation_stage', 'prompt_encoding_stage', 'conditioning_stage', 'timestep_preparation_stage', 'latent_preparation_stage', 'denoising_stage', 'decoding_stage']
[12-13 07:41:27] Profiling request: mocked_fake_id_for_offline_generate for 5 steps...
[12-13 07:41:27] Starting Profiler...
[12-13 07:41:27] [InputValidationStage] started...
[12-13 07:41:27] [InputValidationStage] finished in 0.0043 seconds
[12-13 07:41:27] [TextEncodingStage] started...
WARNING 12-13 07:41:31 [block_adapters.py:131] pipe is None, use FakeDiffusionPipeline instead.
INFO 12-13 07:41:31 [block_adapters.py:229] Auto fill blocks_name: ['blocks', 'blocks'].
INFO 12-13 07:41:31 [block_adapters.py:162] Found transformer NOT from diffusers: sglang.multimodal_gen.runtime.models.dits.wanvideo disable check_forward_pattern by default.
INFO 12-13 07:41:31 [cache_interface.py:200] cache_config is None, using default DBCacheConfig
INFO 12-13 07:41:31 [cache_adapter.py:77] Adapting Cache Acceleration using custom BlockAdapter!
WARNING 12-13 07:41:31 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
WARNING 12-13 07:41:31 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [cache_adapter.py:134] Use custom 'enable_separate_cfg' from BlockAdapter: True. Pipeline: FakeDiffusionPipeline.
INFO 12-13 07:41:31 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W4I1M0MC8_R0.24_SCM1111111111001110000011100000111000110001_dynamic, Calibrator Config: None
INFO 12-13 07:41:31 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W2I1M0MC20_R0.24_SCM1111111111001110000011100000111000110001_dynamic, Calibrator Config: None
WARNING 12-13 07:41:31 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140418593588720, context_manager: FakeDiffusionPipeline_140417988320464.
WARNING 12-13 07:41:31 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140418588350976, context_manager: FakeDiffusionPipeline_140417988320464.
WARNING 12-13 07:41:31 [block_adapters.py:131] pipe is None, use FakeDiffusionPipeline instead.
INFO 12-13 07:41:31 [block_adapters.py:229] Auto fill blocks_name: ['blocks', 'blocks'].
INFO 12-13 07:41:31 [block_adapters.py:162] Found transformer NOT from diffusers: sglang.multimodal_gen.runtime.models.dits.wanvideo disable check_forward_pattern by default.
INFO 12-13 07:41:31 [cache_interface.py:200] cache_config is None, using default DBCacheConfig
INFO 12-13 07:41:31 [cache_adapter.py:77] Adapting Cache Acceleration using custom BlockAdapter!
WARNING 12-13 07:41:31 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
WARNING 12-13 07:41:31 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [cache_adapter.py:134] Use custom 'enable_separate_cfg' from BlockAdapter: True. Pipeline: FakeDiffusionPipeline.
INFO 12-13 07:41:31 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W4I1M0MC8_R0.24_SCM1111111111001110000011100000111000110001_dynamic, Calibrator Config: None
INFO 12-13 07:41:31 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W2I1M0MC20_R0.24_SCM1111111111001110000011100000111000110001_dynamic, Calibrator Config: None
WARNING 12-13 07:41:31 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140336518225488, context_manager: FakeDiffusionPipeline_140334574005872.
WARNING 12-13 07:41:31 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140334574005152, context_manager: FakeDiffusionPipeline_140334574005872.
WARNING 12-13 07:41:31 [block_adapters.py:131] pipe is None, use FakeDiffusionPipeline instead.
INFO 12-13 07:41:31 [block_adapters.py:229] Auto fill blocks_name: ['blocks', 'blocks'].
INFO 12-13 07:41:31 [block_adapters.py:162] Found transformer NOT from diffusers: sglang.multimodal_gen.runtime.models.dits.wanvideo disable check_forward_pattern by default.
INFO 12-13 07:41:31 [cache_interface.py:200] cache_config is None, using default DBCacheConfig
INFO 12-13 07:41:31 [cache_adapter.py:77] Adapting Cache Acceleration using custom BlockAdapter!
WARNING 12-13 07:41:31 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
WARNING 12-13 07:41:31 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [cache_adapter.py:134] Use custom 'enable_separate_cfg' from BlockAdapter: True. Pipeline: FakeDiffusionPipeline.
INFO 12-13 07:41:31 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W4I1M0MC8_R0.24_SCM1111111111001110000011100000111000110001_dynamic, Calibrator Config: None
INFO 12-13 07:41:31 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W2I1M0MC20_R0.24_SCM1111111111001110000011100000111000110001_dynamic, Calibrator Config: None
WARNING 12-13 07:41:31 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_139832392396256, context_manager: FakeDiffusionPipeline_139831119788416.
WARNING 12-13 07:41:31 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_139831119936208, context_manager: FakeDiffusionPipeline_139831119788416.
[12-13 07:41:31] [TextEncodingStage] finished in 3.5739 seconds
[12-13 07:41:31] [ConditioningStage] started...
[12-13 07:41:31] [ConditioningStage] finished in 0.0001 seconds
[12-13 07:41:31] [TimestepPreparationStage] started...
[12-13 07:41:31] [TimestepPreparationStage] finished in 0.0036 seconds
[12-13 07:41:31] [LatentPreparationStage] started...
[12-13 07:41:31] [LatentPreparationStage] finished in 0.0021 seconds
[12-13 07:41:31] [DenoisingStage] started...
[12-13 07:41:31] cache-dit is running in distributed environment (world_size=4). Using local caching strategy: each GPU caches its own activation shards. This may be less accurate than single-GPU caching but provides speedup.
[12-13 07:41:31] SCM: generated mask with 22 compute steps, 18 cache steps (preset=medium)
[12-13 07:41:31] Enabling cache-dit on wan2.2 dual transformers with BlockAdapter
[12-13 07:41:31]   Primary (transformer): Fn=1, Bn=0, W=4, R=0.24, MC=8, TaylorSeer=False
[12-13 07:41:31]   Secondary (transformer_2): Fn=1, Bn=0, W=2, R=0.24, MC=20, TaylorSeer=False
[12-13 07:41:31]   SCM enabled: 22 compute steps, 18 cache steps, policy=dynamic
WARNING 12-13 07:41:31 [block_adapters.py:131] pipe is None, use FakeDiffusionPipeline instead.
INFO 12-13 07:41:31 [block_adapters.py:229] Auto fill blocks_name: ['blocks', 'blocks'].
INFO 12-13 07:41:31 [block_adapters.py:162] Found transformer NOT from diffusers: sglang.multimodal_gen.runtime.models.dits.wanvideo disable check_forward_pattern by default.
INFO 12-13 07:41:31 [cache_interface.py:200] cache_config is None, using default DBCacheConfig
INFO 12-13 07:41:31 [cache_adapter.py:77] Adapting Cache Acceleration using custom BlockAdapter!
WARNING 12-13 07:41:31 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
WARNING 12-13 07:41:31 [block_adapters.py:478] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [cache_adapter.py:134] Use custom 'enable_separate_cfg' from BlockAdapter: True. Pipeline: FakeDiffusionPipeline.
INFO 12-13 07:41:31 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W4I1M0MC8_R0.24_SCM1111111111001110000011100000111000110001_dynamic, Calibrator Config: None
INFO 12-13 07:41:31 [cache_adapter.py:307] Collected Context Config: DBCache_F1B0_W2I1M0MC20_R0.24_SCM1111111111001110000011100000111000110001_dynamic, Calibrator Config: None
WARNING 12-13 07:41:31 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140168208559296, context_manager: FakeDiffusionPipeline_140164652767888.
WARNING 12-13 07:41:31 [pattern_base.py:78] Skipped Forward Pattern Check: ForwardPattern.Pattern_2
INFO 12-13 07:41:31 [pattern_base.py:70] Match Blocks: CachedBlocks_Pattern_0_1_2, for blocks, cache_context: blocks_140165252596704, context_manager: FakeDiffusionPipeline_140164652767888.
[12-13 07:41:31] cache-dit enabled on dual transformers (steps=40)
  0%|                                                                                                                 | 0/40 [00:29<?, ?it/s]
[12-13 07:42:01] [DenoisingStage] Error during execution after 29571.3200 ms: The size of tensor a (135) must match the size of tensor b (134) at non-singleton dimension 4
Traceback (most recent call last):
  File "/nas/bbuf/sglang/python/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 90, in _execute
    batch = stage(batch, server_args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nas/bbuf/sglang/python/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 192, in __call__
    result = self.forward(batch, server_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/nas/bbuf/sglang/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1010, in forward
    latents = self.scheduler.step(
              ^^^^^^^^^^^^^^^^^^^^
  File "/nas/bbuf/sglang/python/sglang/multimodal_gen/runtime/models/schedulers/scheduling_flow_unipc_multistep.py", line 745, in step
    model_output_convert = self.convert_model_output(model_output, sample=sample)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nas/bbuf/sglang/python/sglang/multimodal_gen/runtime/models/schedulers/scheduling_flow_unipc_multistep.py", line 333, in convert_model_output
    x0_pred = sample - sigma_t * model_output
              ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (135) must match the size of tensor b (134) at non-singleton dimension 4
[12-13 07:42:01] Stopping Profiler...
[12-13 07:42:01] Saving profiler traces to: /nas/bbuf/sglang/logs/mocked_fake_id_for_offline_generate-5_steps-global-rank0.trace.json.gz
[12-13 07:42:01] Failed to generate output for prompt 1: Error executing request mocked_fake_id_for_offline_generate: The size of tensor a (135) must match the size of tensor b (134) at non-singleton dimension 4
Traceback (most recent call last):
  File "/nas/bbuf/sglang/python/sglang/multimodal_gen/runtime/utils/logging_utils.py", line 495, in log_generation_timer
    yield timer
  File "/nas/bbuf/sglang/python/sglang/multimodal_gen/runtime/entrypoints/diffusion_generator.py", line 273, in generate
    raise Exception(f"{output_batch.error}")
Exception: Error executing request mocked_fake_id_for_offline_generate: The size of tensor a (135) must match the size of tensor b (134) at non-singleton dimension 4
[12-13 07:42:01] Completed batch processing. Generated 0 outputs in 33.90 seconds.
[12-13 07:42:01] Generator was garbage collected without being shut down. Attempting to shut down the local server and client.
/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

### Reproduction

```shell
SGLANG_CACHE_DIT_ENABLED=true SGLANG_CACHE_DIT_WARMUP=4 SGLANG_CACHE_DIT_MC=8 SGLANG_CACHE_DIT_RDT=0.24 SGLANG_CACHE_DIT_FN=1 SGLANG_CACHE_DIT_BN=0 SGLANG_CACHE_DIT_SECONDARY_WARMUP=2 SGLANG_CACHE_DIT_SECONDARY_MC=20 SGLANG_CACHE_DIT_SECONDARY_RDT=0.24 SGLANG_CACHE_DIT_SECONDARY_FN=1 SGLANG_CACHE_DIT_SECONDARY_BN=0 SGLANG_CACHE_DIT_SCM_PRESET=medium SGLANG_CACHE_DIT_SCM_POLICY=dynamic sglang generate   --model-path /nas/bbuf/Wan2.2-T2V-A14B-Diffusers   --text-encoder-cpu-offload   --pin-cpu-memory   --num-gpus 4   --ulysses-degree 4 --profile --prompt "A young woman with long blonde hair, wearing a white t-shirt and blue jeans, walking through a sunny park with green trees in the background, carrying a brown shoulder bag, followed by a smooth tracking shot that moves alongside her as she walks."
```

### Environment

```shell
root@d896cfd062bd:/nas/bbuf/sglang# python3 -m sglang.check_env
Python: 3.12.3 (main, Jun 18 2025, 17:59:45) [GCC 13.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H100 80GB HBM3
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 13.0, V13.0.48
CUDA Driver Version: 550.127.05
PyTorch: 2.9.1+cu128
sglang: 0.5.6.post2
sgl_kernel: 0.3.19
flashinfer_python: 0.5.3
flashinfer_cubin: 0.5.3
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 1.26.4
aiohttp: 3.12.14
fastapi: 0.124.2
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.33.0
orjson: 3.11.5
outlines: 0.1.11
packaging: 25.0
psutil: 7.0.0
pydantic: 2.11.7
python-multipart: 0.0.20
pyzmq: 27.0.1
uvicorn: 0.38.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.75.0
litellm: Module Not Found
decord2: 2.0.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    NIC8NIC9     CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV18    NV18    NV18    NV18    NV18    NV18    NV18    NODE    NODE    SYS     PIX     NODE    NODE    NODE    SYS     SYS SYS      0-47,96-143     0               N/A
GPU1    NV18     X      NV18    NV18    NV18    NV18    NV18    NV18    NODE    NODE    SYS     NODE    PIX     NODE    NODE    SYS     SYS SYS      0-47,96-143     0               N/A
GPU2    NV18    NV18     X      NV18    NV18    NV18    NV18    NV18    NODE    NODE    SYS     NODE    NODE    PIX     NODE    SYS     SYS SYS      0-47,96-143     0               N/A
GPU3    NV18    NV18    NV18     X      NV18    NV18    NV18    NV18    NODE    NODE    SYS     NODE    NODE    NODE    PIX     SYS     SYS SYS      0-47,96-143     0               N/A
GPU4    NV18    NV18    NV18    NV18     X      NV18    NV18    NV18    SYS     SYS     NODE    SYS     SYS     SYS     SYS     PIX     NODENODE     48-95,144-191   1               N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X      NV18    NV18    SYS     SYS     NODE    SYS     SYS     SYS     SYS     NODE    PIX NODE     48-95,144-191   1               N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X      NV18    SYS     SYS     NODE    SYS     SYS     SYS     SYS     NODE    NODEPIX      48-95,144-191   1               N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X      SYS     SYS     PIX     SYS     SYS     SYS     SYS     NODE    NODENODE     48-95,144-191   1               N/A
NIC0    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS      X      PXB     SYS     NODE    NODE    NODE    NODE    SYS     SYS SYS
NIC1    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     PXB      X      SYS     NODE    NODE    NODE    NODE    SYS     SYS SYS
NIC2    SYS     SYS     SYS     SYS     NODE    NODE    NODE    PIX     SYS     SYS      X      SYS     SYS     SYS     SYS     NODE    NODENODE
NIC3    PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS     NODE    NODE    SYS      X      NODE    NODE    NODE    SYS     SYS SYS
NIC4    NODE    PIX     NODE    NODE    SYS     SYS     SYS     SYS     NODE    NODE    SYS     NODE     X      NODE    NODE    SYS     SYS SYS
NIC5    NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     NODE    NODE    SYS     NODE    NODE     X      NODE    SYS     SYS SYS
NIC6    NODE    NODE    NODE    PIX     SYS     SYS     SYS     SYS     NODE    NODE    SYS     NODE    NODE    NODE     X      SYS     SYS SYS
NIC7    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    SYS     SYS     NODE    SYS     SYS     SYS     SYS      X      NODENODE
NIC8    SYS     SYS     SYS     SYS     NODE    PIX     NODE    NODE    SYS     SYS     NODE    SYS     SYS     SYS     SYS     NODE     X  NODE
NIC9    SYS     SYS     SYS     SYS     NODE    NODE    PIX     NODE    SYS     SYS     NODE    SYS     SYS     SYS     SYS     NODE    NODE X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_25_0
  NIC1: mlx5_200_0
  NIC2: mlx5_400_0
  NIC3: mlx5_400_1
  NIC4: mlx5_400_2
  NIC5: mlx5_400_3
  NIC6: mlx5_400_4
  NIC7: mlx5_400_5
  NIC8: mlx5_400_6
  NIC9: mlx5_400_7


ulimit soft: 1048576
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Diffusion profiler wan2.2 bug #15045

Checklist

Describe the bug

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Diffusion profiler wan2.2 bug #15045

Description

Checklist

Describe the bug

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions