Skip to content

[Bug] 'disaggregation_mode' is missing in warmup function when compile deep_gemm #8617

@lbh2001

Description

@lbh2001

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

I tried to compile deep_gemm with this command:

python -m sglang.compile_deep_gemm --model-path /ssd1/models/huggingface.co/deepseek-ai/DeepSeek-R1 --enable-deepep-moe --trust-remote-code --tp-size 8

Then the following error occurred:

[2025-07-31 20:37:23 TP0] Load weight end. type=DeepseekV3ForCausalLM, dtype=torch.bfloat16, avail mem=56.21 GB, mem usage=81.74 GB.
[2025-07-31 20:37:24 TP0] KV Cache is allocated. #tokens: 542023, KV size: 35.47 GB
[2025-07-31 20:37:24 TP7] KV Cache is allocated. #tokens: 542023, KV size: 35.47 GB
[2025-07-31 20:37:24 TP5] KV Cache is allocated. #tokens: 542023, KV size: 35.47 GB
[2025-07-31 20:37:24 TP2] KV Cache is allocated. #tokens: 542023, KV size: 35.47 GB
[2025-07-31 20:37:24 TP4] KV Cache is allocated. #tokens: 542023, KV size: 35.47 GB
[2025-07-31 20:37:24 TP0] Memory pool end. avail mem=19.43 GB
[2025-07-31 20:37:24 TP3] KV Cache is allocated. #tokens: 542023, KV size: 35.47 GB
[2025-07-31 20:37:24 TP1] KV Cache is allocated. #tokens: 542023, KV size: 35.47 GB
[2025-07-31 20:37:25 TP6] KV Cache is allocated. #tokens: 542023, KV size: 35.47 GB
[2025-07-31 20:37:25 TP0] max_total_num_tokens=542023, chunked_prefill_size=8192, max_prefill_tokens=16384, max_running_requests=2048, context_len=163840, available_gpu_mem=19.34 GB
[2025-07-31 20:37:25] INFO:     Started server process [1497]
[2025-07-31 20:37:25] INFO:     Waiting for application startup.
[2025-07-31 20:37:25] Running warmup compile-deep-gemm
[2025-07-31 20:37:25] ERROR:    Traceback (most recent call last):
  File "/root/paddlejob/venv/lib/python3.12/site-packages/starlette/routing.py", line 694, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/src/python/sglang/srt/entrypoints/http_server.py", line 148, in lifespan
    await execute_warmups(
  File "/tmp/src/python/sglang/srt/warmup.py", line 34, in execute_warmups
    await _warmup_registry[warmup_name](disaggregation_mode, tokenizer_manager)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: warm_up_compile() takes 1 positional argument but 2 were given

[2025-07-31 20:37:25] ERROR:    Application startup failed. Exiting.

Currently, in compile_deep_gemm.py, the warmup function is defined as follows, which provided only 1 args.

@warmup("compile-deep-gemm")
async def warm_up_compile(tokenizer_manager: TokenizerManager):
    ...

But when it was called in execute_warmups, two args was passed, maybe the disaggregation_mode is missing?

async def execute_warmups(
    disaggregation_mode: str,
    warmup_names: List[str],
    tokenizer_manager: TokenizerManager,
):
    for warmup_name in warmup_names:
        if warmup_name not in _warmup_registry:
            logger.warning(f"Could not find custom warmup {warmup_name}")
            continue
        logger.info(f"Running warmup {warmup_name}")
        await _warmup_registry[warmup_name](disaggregation_mode, tokenizer_manager)

Reproduction

python -m sglang.compile_deep_gemm --model-path /ssd1/models/huggingface.co/deepseek-ai/DeepSeek-R1 --enable-deepep-moe --trust-remote-code --tp-size 8

Environment

Python: 3.12.3 (main, May 26 2025, 18:50:19) [GCC 13.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H800
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.6, V12.6.85
CUDA Driver Version: 550.127.08
PyTorch: 2.7.1+cu126
sglang: 0.4.9.post6
sgl_kernel: 0.2.7
flashinfer_python: 0.2.7
triton: 3.3.1
transformers: 4.53.0
torchao: 0.9.0
numpy: 2.3.2
aiohttp: 3.12.15
fastapi: 0.116.1
hf_transfer: 0.1.9
huggingface_hub: 0.34.3
interegular: 0.3.3
modelscope: 1.28.1
orjson: 3.11.1
outlines: 0.1.11
packaging: 25.0
psutil: 7.0.0
pydantic: 2.11.7
python-multipart: 0.0.20
pyzmq: 27.0.0
uvicorn: 0.35.0
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.19
openai: Module Not Found
tiktoken: 0.9.0
anthropic: Module Not Found
litellm: Module Not Found
decord: Module Not Found
NVIDIA Topology:
	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	NIC0	NIC1	NIC2	NIC3	NIC4	NIC5	NIC6	NIC7	NIC8	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NV18	NV18	NV18	NV18	NV18	NV18	NV18	PIX	NODE	NODE	NODE	SYS	SYS	SYS	SYS	NODE	0-47,96-143	0		N/A
GPU1	NV18	 X 	NV18	NV18	NV18	NV18	NV18	NV18	NODE	PIX	NODE	NODE	SYS	SYS	SYS	SYS	NODE	0-47,96-143	0		N/A
GPU2	NV18	NV18	 X 	NV18	NV18	NV18	NV18	NV18	NODE	NODE	PIX	NODE	SYS	SYS	SYS	SYS	NODE	0-47,96-143	0		N/A
GPU3	NV18	NV18	NV18	 X 	NV18	NV18	NV18	NV18	NODE	NODE	NODE	PIX	SYS	SYS	SYS	SYS	NODE	0-47,96-143	0		N/A
GPU4	NV18	NV18	NV18	NV18	 X 	NV18	NV18	NV18	SYS	SYS	SYS	SYS	PIX	NODE	NODE	NODE	SYS	48-95,144-191	1		N/A
GPU5	NV18	NV18	NV18	NV18	NV18	 X 	NV18	NV18	SYS	SYS	SYS	SYS	NODE	PIX	NODE	NODE	SYS	48-95,144-191	1		N/A
GPU6	NV18	NV18	NV18	NV18	NV18	NV18	 X 	NV18	SYS	SYS	SYS	SYS	NODE	NODE	PIX	NODE	SYS	48-95,144-191	1		N/A
GPU7	NV18	NV18	NV18	NV18	NV18	NV18	NV18	 X 	SYS	SYS	SYS	SYS	NODE	NODE	NODE	PIX	SYS	48-95,144-191	1		N/A
NIC0	PIX	NODE	NODE	NODE	SYS	SYS	SYS	SYS	 X 	NODE	NODE	NODE	SYS	SYS	SYS	SYS	NODE
NIC1	NODE	PIX	NODE	NODE	SYS	SYS	SYS	SYS	NODE	 X 	NODE	NODE	SYS	SYS	SYS	SYS	NODE
NIC2	NODE	NODE	PIX	NODE	SYS	SYS	SYS	SYS	NODE	NODE	 X 	NODE	SYS	SYS	SYS	SYS	NODE
NIC3	NODE	NODE	NODE	PIX	SYS	SYS	SYS	SYS	NODE	NODE	NODE	 X 	SYS	SYS	SYS	SYS	NODE
NIC4	SYS	SYS	SYS	SYS	PIX	NODE	NODE	NODE	SYS	SYS	SYS	SYS	 X 	NODE	NODE	NODE	SYS
NIC5	SYS	SYS	SYS	SYS	NODE	PIX	NODE	NODE	SYS	SYS	SYS	SYS	NODE	 X 	NODE	NODE	SYS
NIC6	SYS	SYS	SYS	SYS	NODE	NODE	PIX	NODE	SYS	SYS	SYS	SYS	NODE	NODE	 X 	NODE	SYS
NIC7	SYS	SYS	SYS	SYS	NODE	NODE	NODE	PIX	SYS	SYS	SYS	SYS	NODE	NODE	NODE	 X 	SYS
NIC8	NODE	NODE	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE	SYS	SYS	SYS	SYS	 X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_3
  NIC2: mlx5_4
  NIC3: mlx5_5
  NIC4: mlx5_6
  NIC5: mlx5_7
  NIC6: mlx5_8
  NIC7: mlx5_9
  NIC8: mlx5_bond_0


ulimit soft: 1048576

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions