2026-04-18 22:34:20 TP0] model.eh_proj.weight_scale not found in params_dict.
[2026-04-18 22:34:20 TP7] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3771, in run_scheduler_process
scheduler = Scheduler(
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 425, in __init__
self.init_model_worker()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 683, in init_model_worker
self.maybe_init_draft_worker()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 667, in maybe_init_draft_worker
self.draft_worker = DraftWorkerClass(**draft_worker_kwargs)
File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 658, in __init__
self._draft_worker = EagleDraftWorker(
File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 138, in __init__
self.draft_worker = TpModelWorker(
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 260, in __init__
self._init_model_runner()
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 343, in _init_model_runner
self._model_runner = ModelRunner(
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 480, in __init__
self.initialize(pre_model_load_memory)
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 570, in initialize
self.load_model()
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 1264, in load_model
self.model = self.loader.load_model(
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 699, in load_model
self.load_weights_and_postprocess(
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 708, in load_weights_and_postprocess
model.load_weights(weights)
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_nextn.py", line 284, in load_weights
super().load_weights(weights, is_nextn=True)
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 2323, in load_weights
self.do_load_weights(weights, is_nextn)
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_common/deepseek_weight_loader.py", line 361, in do_load_weights
future.result()
python3 -m sglang.launch_server \
--model-path $MODEL \
--host=0.0.0.0 \
--port $PORT \
--trust-remote-code \
--tp $TP \
--chunked-prefill-size 131072 \
--disable-radix-cache \
--mem-fraction-static 0.85 \
--model-loader-extra-config '{"enable_multithread_load": true}' \
--watchdog-timeout 1200 \
--reasoning-parser glm45 \
--tool-call-parser glm47 \
--speculative-algorithm EAGLE \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
$EVAL_CONTEXT_ARGS > $SERVER_LOG 2>&1 &
Checklist
Describe the bug
hi @hubertlu-tw @HaiShaw @chunfangamd
glm5 mxfp4 mtp is broken
Reproduction
https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24615280642/job/71976386638?pr=1091
https://github.com/SemiAnalysisAI/InferenceX/pull/1091/changes#diff-802d3a7be2d0d2932c889be8616a6c220b90cf93b440be7b06cc645414d889bf
Environment
lmsysorg/sglang-rocm:v0.5.10rc0-rocm700-mi35x-20260417amd/GLM-5-MXFP4