Skip to content

SGLang Launch error with Qwen2.5-14B draft model, need help #231

@y-d-y

Description

@y-d-y

My target model is Qwen-2.5-14B

I Use default config train a draft model, the train_eagle3_online.py generated a eagle3-config.json, content is:

{
"architectures": [
"LlamaForCausalLMEagle3"
],
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"max_position_embeddings": 131072,
"model_type": "llama",
"num_attention_heads": 40,
"num_key_value_heads": 8,
"num_hidden_layers": 1,
"pad_token_id": 0,
"rms_norm_eps": 1e-06,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.28.1",
"use_cache": true,
"vocab_size": 152064,
"draft_vocab_size": 32000
}

after traning, i use sglang to load draft model, cmd is:

python -m sglang.launch_server --model-path /home/Models/Qwen2.5-14B-Instruct --host 0.0.0.0 --port 30000 --tp-size 2 --served-model-name qwen2 --context-length 2048 --speculative-algorithm EAGLE3 --speculative-num-steps 5 --speculative-eagle-topk 4 --speculative-num-draft-tokens 8 --mem-fraction 0.6 --cuda-graph-max-bs 2 --dtype float16 --speculative-draft-model-path /home/Models/epoch_2

error stack is:

Capturing batches (bs=2 avail_mem=14.40 GB): 0%| | 0/2 [00:00<?, ?it/s][2025-09-13 18:14:09 TP1] Registering 0 cuda graph addresses
Capturing batches (bs=2 avail_mem=14.40 GB): 0%| | 0/2 [00:01<?, ?it/s]
[2025-09-13 18:14:09 TP0] Registering 0 cuda graph addresses
[2025-09-13 18:14:09 TP1] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 2587, in run_scheduler_process
scheduler = Scheduler(
^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 329, in init
self.tp_worker = TpWorkerClass(
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 93, in init
self.model_runner = ModelRunner(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 250, in init
self.initialize(min_per_gpu_memory)
File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 386, in initialize
self.init_device_graphs()
File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 1761, in init_device_graphs
self.graph_runner = graph_runnersself.device
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 389, in init
self.capture()
File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 497, in capture
) = self.capture_one_batch_size(bs, forward)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 668, in capture_one_batch_size
run_once()
File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 657, in run_once
logits_output_or_pp_proxy_tensors = forward(
^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sglang/srt/models/qwen2.py", line 489, in forward
hidden_states, aux_hidden_states = hidden_states
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

I'm new to this, i need some help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions