Skip to content

feat: Add SGLang /engine weight update endpoints#6094

Merged
Aphoh merged 2 commits into
mainfrom
warnold/sgl-update-weights-endpoints
Feb 10, 2026
Merged

feat: Add SGLang /engine weight update endpoints#6094
Aphoh merged 2 commits into
mainfrom
warnold/sgl-update-weights-endpoints

Conversation

@Aphoh

@Aphoh Aphoh commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Overview:

Wire update_weights and update_weight_version routes to tokenizer manager

Details:

Adds the basic /engine/ routes needed for sglang weight update support.

Summary by CodeRabbit

  • New Features
    • Introduced dynamic model weight update functionality enabling seamless on-the-fly model updates from multiple sources—disk, tensor, distributed systems, and inter-process communication—without requiring server restarts, improving operational flexibility and reducing downtime during model changes.
    • Added comprehensive weight versioning management system with configurable request handling for managed version transitions.

Wire update_weights and update_weight_version routes to tokenizer manager
@Aphoh Aphoh requested a review from ishandhanani February 9, 2026 23:54
@Aphoh Aphoh requested a review from a team as a code owner February 9, 2026 23:54
@Aphoh Aphoh requested a review from a team February 9, 2026 23:54
@github-actions github-actions Bot added the backend::sglang Relates to the sglang backend label Feb 9, 2026
@Aphoh Aphoh changed the title [feat] Add SGLang /engine weight update endpoints feat: Add SGLang /engine weight update endpoints Feb 9, 2026
@github-actions github-actions Bot added the feat label Feb 9, 2026
@coderabbitai

coderabbitai Bot commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Five new asynchronous handler methods are added to BaseWorkerHandler to enable model weight updates without server restart: update_weights_from_disk, update_weights_from_tensor, update_weights_from_distributed, update_weights_from_ipc, and update_weight_version. These methods are registered as engine routes and return structured success/message responses.

Changes

Cohort / File(s) Summary
Weight Update Handler Methods
components/src/dynamo/sglang/request_handlers/handler_base.py
Added 5 async methods to BaseWorkerHandler for managing weight updates from various sources (disk, tensor, distributed, IPC) and weight version management. Each method creates corresponding request input objects, invokes tokenizer_manager operations, and returns structured responses with success flags and messages. Updated register_engine_routes to register all 5 new endpoints.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Five pathways for weights to flow and dance,
From disk, from tensors, from distributed expanse,
No server restart—just smooth updates bloom,
This rabbit's delight, a lightweight groomed!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is incomplete and missing critical sections from the template: no 'Where should the reviewer start?' section identifying specific files, and no 'Related Issues' section with action keywords. Add 'Where should the reviewer start?' section pointing to handler_base.py and add 'Related Issues:' section with any relevant GitHub issue references.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding SGLang /engine weight update endpoints, which directly aligns with the addition of five new weight management methods and their route registrations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@components/src/dynamo/sglang/request_handlers/handler_base.py`:
- Around line 299-312: Wrap the call to
self.engine.tokenizer_manager.abort_request(abort_all=True) inside a try/except
in update_weight_version, so failures in abort_request produce a structured
error response instead of an uncaught exception; only set
self.engine.tokenizer_manager.server_args.weight_version = req.new_version after
a successful abort (or when abort_all is False), and on exception return
{"success": False, "message": "Failed to abort in-flight requests", "error":
str(e)} (also log the error) to mirror other handler methods' error handling.
- Around line 253-312: The file failed Black formatting; run the formatter on
the module containing the handler methods (e.g., update_weights_from_disk,
update_weights_from_tensor, update_weights_from_distributed,
update_weights_from_ipc, update_weight_version) to apply Black's style, verify
the diff is only formatting changes, then stage and commit the reformatted file
(or run the repository pre-commit hook) so CI passes.
🧹 Nitpick comments (1)
components/src/dynamo/sglang/request_handlers/handler_base.py (1)

253-297: Missing error handling — inconsistent with existing handler pattern.

The existing release_memory_occupation and resume_memory_occupation methods wrap their logic in try/except and return structured error dicts ({"status": "error", "message": ...}). All five new methods propagate exceptions unhandled. If UpdateWeightFromDiskReqInput(**body) receives unexpected keys or a required field is missing, the caller gets an opaque Rust-level error instead of a structured JSON response.

Consider wrapping each method body in a try/except for consistency, e.g.:

Proposed pattern (applied to one method as example)
     async def update_weights_from_disk(self, body: dict) -> dict:
         """Update model weights from disk without restarting the server."""
         from sglang.srt.managers.io_struct import UpdateWeightFromDiskReqInput
 
-        req = UpdateWeightFromDiskReqInput(**body)
-        success, message, num_paused_requests = (
-            await self.engine.tokenizer_manager.update_weights_from_disk(req, None)
-        )
-        return {
-            "success": success,
-            "message": message,
-            "num_paused_requests": num_paused_requests,
-        }
+        try:
+            req = UpdateWeightFromDiskReqInput(**body)
+            success, message, num_paused_requests = (
+                await self.engine.tokenizer_manager.update_weights_from_disk(req, None)
+            )
+            return {
+                "success": success,
+                "message": message,
+                "num_paused_requests": num_paused_requests,
+            }
+        except Exception as e:
+            logging.error(f"Failed to update weights from disk: {e}")
+            return {"success": False, "message": str(e)}

Comment thread components/src/dynamo/sglang/request_handlers/handler_base.py
Comment thread components/src/dynamo/sglang/request_handlers/handler_base.py
@Aphoh Aphoh enabled auto-merge (squash) February 10, 2026 00:29
@Aphoh Aphoh merged commit 56a1b6e into main Feb 10, 2026
50 of 51 checks passed
@Aphoh Aphoh deleted the warnold/sgl-update-weights-endpoints branch February 10, 2026 01:03
soodoshll pushed a commit to soodoshll/dynamo that referenced this pull request Feb 12, 2026
biswapanda added a commit that referenced this pull request May 8, 2026
Cleanup-only — no behavior change. Strips review-tracker noise that
accumulated on top of PR-added text during iteration:

  - "Closes hhzhang16 HH-19/HH-21/HH-22/HH-23/HH-25/HH-26/HH-27"
  - "CR-8 / CR-9 / CR-10 closure" prefixes on serde-error / doc-attach fixes
  - Branch-name references: bis/parity-tokenize-tcp, bis/prime-rl-merged
  - Internal PR numbers: #6094, #7699, #8197, #9141
  - Phase numbers from internal design docs (rl-support.md Phase 1/4/5)
  - "prime-rl" mentions in narrative copy and mermaid diagrams →
    generic "RL trainer / RL orchestrator / external client"

Technical content (semantics, invariants, why-this-exists rationale)
preserved everywhere; only the internal-process scaffolding is removed.

Scope verification: every removed line is one this branch ADDED
(diff main..HEAD shows the removed text on a `+` line). No edits land
on pre-existing main-branch comments. Specifically reverted the
nvext.rs cleanup attempt — its target lines (GAIE Stage 1/2,
SGLang-specific) live on main, not in this PR's diff.

Files touched:
  components/src/dynamo/vllm/handlers.py            +9  -10
  components/src/dynamo/vllm/worker_factory.py      +6  -4
  docs/dynamo-RL-api.md                             +19 -32
  lib/llm/src/http/service/openai.rs                +32 -34
  lib/llm/src/protocols/openai/chat_completions/delta.rs  +4 -4
  lib/llm/src/protocols/openai/completions/delta.rs       +3 -3
  lib/llm/src/protocols/openai/validate.rs                +20 -20

cargo check -p dynamo-llm: clean (1 pre-existing benign warning).
yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::sglang Relates to the sglang backend feat size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants