Skip to content

fix 'LazyValue' object has no attribute 'keys' in qwen3 moe + deepep + eplb#21820

Closed
Evgueni-Petrov-aka-espetrov wants to merge 14 commits intosgl-project:mainfrom
Evgueni-Petrov-aka-espetrov:main
Closed

fix 'LazyValue' object has no attribute 'keys' in qwen3 moe + deepep + eplb#21820
Evgueni-Petrov-aka-espetrov wants to merge 14 commits intosgl-project:mainfrom
Evgueni-Petrov-aka-espetrov:main

Conversation

@Evgueni-Petrov-aka-espetrov
Copy link
Copy Markdown
Contributor

Motivation

found this typo while tuning sglang performance for qwen3 coder 480b in disaggregated prefill-decode setup

Modifications

add missing attribute name

Accuracy Tests

n/a

Speed Tests and Profiling

n/a

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

temporary tensors get stuck in vram for too long
hit this oom with qwen3 coder 480b instruct fp8 on 2 mi355x
  File "/sgl-workspace/sglang/python/sglang/srt/eplb/eplb_manager.py", line 110, in _compute_update_layer_ids_chunks
    list(self._model_runner.model.routed_experts_weights_of_layer.keys())
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'LazyValue' object has no attribute 'keys'
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a crash in the Qwen3 MoE model by accessing the .value property of a LazyValue wrapper during initialization. The review feedback points out that while this change resolves the immediate issue, it renders the LazyValue wrapper redundant because the evaluation is no longer deferred; a refactor is suggested to remove the wrapper entirely for better maintainability.

)
}
)
).value
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While adding .value fixes the reported crash by ensuring self.routed_experts_weights_of_layer is a dictionary, it makes the LazyValue wrapper and the associated lambda (starting at line 1187) redundant. Since get_moe_weights() is a lightweight operation that merely returns references to existing tensors, the lazy evaluation provides no benefit here. For better maintainability, consider refactoring this block to remove LazyValue and assign the dictionary comprehension directly.

vroomfondel added a commit to vroomfondel/dgxarley that referenced this pull request Apr 7, 2026
**Upstream status** as of 2026-04-06:
- Qwen3.5: fixed via [PR #19767](sgl-project/sglang#19767) (merged 2026-03-09, included in v0.5.10)
- Qwen3: [PR #21461](sgl-project/sglang#21461) — closed without merge 2026-03-30 (CI failure), superseded by #21822
- Qwen3: [PR #21822](sgl-project/sglang#21822) — new fix opened 2026-03-26, addresses `AttributeError: 'LazyValue' object has no attribute 'keys'` in `eplb_manager.py` for Qwen3 MoE. Code review 2026-04-04 by `Fridge003` and `Evgueni-Petrov-aka-espetrov`. Alternative `LazyValue.__getattr__` approach proposed (avoids modifying the model class). **Approved** by `Fridge003` on 2026-04-06, CI rerun triggered — awaiting merge. (Duplicate [PR #21820](sgl-project/sglang#21820) was closed same day in favour of #21822.) Not in v0.5.10

When `--enable-eplb` is active with EP, the `EPLBManager` crashes after its first rebalance interval (default: 1000 forward passes):
- SGLang PR #17137 — non-Marlin WNA16MoE port (does not fix EP bug)
- SGLang #14158 — update_weights_from_tensor for WNA16MoE (unrelated)
- SGLang [PR #13715](sgl-project/sglang#13715) — fix EPLB + FP4 weight tensor filtering (merged, different issue)
- SGLang [PR #20963](sgl-project/sglang#20963) — Nvidia modelopt refactoring (1/N). Under active review: reviewer `Edwardf0t1` asked for end-to-end verification 2026-03-31, author `wenscarl` responded 2026-04-01 and posted 3 further inline review responses 2026-04-06. Not stalled but awaiting approval. Migrates the NVFP4 code as-is — expected vehicle for EP-awareness fixes (#20869, #21630). Watch this PR for resolution of the NVFP4 input_scale and CutlassMoEParams bugs
- SGLang [PR #21822](sgl-project/sglang#21822) — new EPLB/Qwen3 fix (opened 2026-03-26). Addresses `LazyValue.keys()` AttributeError. Code review 2026-04-04 by `Fridge003` and `Evgueni-Petrov-aka-espetrov`. Alternative `LazyValue.__getattr__` approach proposed. **Approved** by `Fridge003` on 2026-04-06, CI rerun triggered — awaiting merge

"Good code is like humor: when you have to explain it, it’s bad." - Cory House
P.S.: Code reviews and approvals are crucial for maintaining high-quality software.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant