SGLang Launch error  with Qwen2.5-14B  draft model,  need help

My target model is Qwen-2.5-14B

I  Use default config train a draft model,  the train_eagle3_online.py generated a  eagle3-config.json， content is:

{
  "architectures": [
    "LlamaForCausalLMEagle3"
  ],
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 131072,
  "model_type": "llama",
  "num_attention_heads": 40,
  "num_key_value_heads": 8,
  "num_hidden_layers": 1,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-06,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.28.1",
  "use_cache": true,
  "vocab_size": 152064,
  "draft_vocab_size": 32000
}

after traning,  i use sglang to load draft model,  cmd is:

python -m sglang.launch_server   --model-path /home/Models/Qwen2.5-14B-Instruct   --host 0.0.0.0   --port 30000 --tp-size 2 --served-model-name qwen2 --context-length 2048   --speculative-algorithm EAGLE3 --speculative-num-steps 5   --speculative-eagle-topk 4 --speculative-num-draft-tokens 8 --mem-fraction 0.6   --cuda-graph-max-bs 2 --dtype float16   --speculative-draft-model-path /home/Models/epoch_2

error stack is:

Capturing batches (bs=2 avail_mem=14.40 GB):   0%|                                                                                                                                      | 0/2 [00:00<?, ?it/s][2025-09-13 18:14:09 TP1] Registering 0 cuda graph addresses
Capturing batches (bs=2 avail_mem=14.40 GB):   0%|                                                                                                                                      | 0/2 [00:01<?, ?it/s]
[2025-09-13 18:14:09 TP0] Registering 0 cuda graph addresses
[2025-09-13 18:14:09 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 2587, in run_scheduler_process
    scheduler = Scheduler(
                ^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 329, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 93, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 250, in __init__
    self.initialize(min_per_gpu_memory)
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 386, in initialize
    self.init_device_graphs()
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 1761, in init_device_graphs
    self.graph_runner = graph_runners[self.device](self)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 389, in __init__
    self.capture()
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 497, in capture
    ) = self.capture_one_batch_size(bs, forward)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 668, in capture_one_batch_size
    run_once()
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 657, in run_once
    logits_output_or_pp_proxy_tensors = forward(
                                        ^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/models/qwen2.py", line 489, in forward
    hidden_states, aux_hidden_states = hidden_states
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)


I'm new to this,  i need some help!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGLang Launch error with Qwen2.5-14B draft model, need help #231

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SGLang Launch error with Qwen2.5-14B draft model, need help #231

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions