feat: Add SGLang /engine weight update endpoints#6094
Conversation
Wire update_weights and update_weight_version routes to tokenizer manager
WalkthroughFive new asynchronous handler methods are added to BaseWorkerHandler to enable model weight updates without server restart: update_weights_from_disk, update_weights_from_tensor, update_weights_from_distributed, update_weights_from_ipc, and update_weight_version. These methods are registered as engine routes and return structured success/message responses. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@components/src/dynamo/sglang/request_handlers/handler_base.py`:
- Around line 299-312: Wrap the call to
self.engine.tokenizer_manager.abort_request(abort_all=True) inside a try/except
in update_weight_version, so failures in abort_request produce a structured
error response instead of an uncaught exception; only set
self.engine.tokenizer_manager.server_args.weight_version = req.new_version after
a successful abort (or when abort_all is False), and on exception return
{"success": False, "message": "Failed to abort in-flight requests", "error":
str(e)} (also log the error) to mirror other handler methods' error handling.
- Around line 253-312: The file failed Black formatting; run the formatter on
the module containing the handler methods (e.g., update_weights_from_disk,
update_weights_from_tensor, update_weights_from_distributed,
update_weights_from_ipc, update_weight_version) to apply Black's style, verify
the diff is only formatting changes, then stage and commit the reformatted file
(or run the repository pre-commit hook) so CI passes.
🧹 Nitpick comments (1)
components/src/dynamo/sglang/request_handlers/handler_base.py (1)
253-297: Missing error handling — inconsistent with existing handler pattern.The existing
release_memory_occupationandresume_memory_occupationmethods wrap their logic intry/exceptand return structured error dicts ({"status": "error", "message": ...}). All five new methods propagate exceptions unhandled. IfUpdateWeightFromDiskReqInput(**body)receives unexpected keys or a required field is missing, the caller gets an opaque Rust-level error instead of a structured JSON response.Consider wrapping each method body in a try/except for consistency, e.g.:
Proposed pattern (applied to one method as example)
async def update_weights_from_disk(self, body: dict) -> dict: """Update model weights from disk without restarting the server.""" from sglang.srt.managers.io_struct import UpdateWeightFromDiskReqInput - req = UpdateWeightFromDiskReqInput(**body) - success, message, num_paused_requests = ( - await self.engine.tokenizer_manager.update_weights_from_disk(req, None) - ) - return { - "success": success, - "message": message, - "num_paused_requests": num_paused_requests, - } + try: + req = UpdateWeightFromDiskReqInput(**body) + success, message, num_paused_requests = ( + await self.engine.tokenizer_manager.update_weights_from_disk(req, None) + ) + return { + "success": success, + "message": message, + "num_paused_requests": num_paused_requests, + } + except Exception as e: + logging.error(f"Failed to update weights from disk: {e}") + return {"success": False, "message": str(e)}
Cleanup-only — no behavior change. Strips review-tracker noise that accumulated on top of PR-added text during iteration: - "Closes hhzhang16 HH-19/HH-21/HH-22/HH-23/HH-25/HH-26/HH-27" - "CR-8 / CR-9 / CR-10 closure" prefixes on serde-error / doc-attach fixes - Branch-name references: bis/parity-tokenize-tcp, bis/prime-rl-merged - Internal PR numbers: #6094, #7699, #8197, #9141 - Phase numbers from internal design docs (rl-support.md Phase 1/4/5) - "prime-rl" mentions in narrative copy and mermaid diagrams → generic "RL trainer / RL orchestrator / external client" Technical content (semantics, invariants, why-this-exists rationale) preserved everywhere; only the internal-process scaffolding is removed. Scope verification: every removed line is one this branch ADDED (diff main..HEAD shows the removed text on a `+` line). No edits land on pre-existing main-branch comments. Specifically reverted the nvext.rs cleanup attempt — its target lines (GAIE Stage 1/2, SGLang-specific) live on main, not in this PR's diff. Files touched: components/src/dynamo/vllm/handlers.py +9 -10 components/src/dynamo/vllm/worker_factory.py +6 -4 docs/dynamo-RL-api.md +19 -32 lib/llm/src/http/service/openai.rs +32 -34 lib/llm/src/protocols/openai/chat_completions/delta.rs +4 -4 lib/llm/src/protocols/openai/completions/delta.rs +3 -3 lib/llm/src/protocols/openai/validate.rs +20 -20 cargo check -p dynamo-llm: clean (1 pre-existing benign warning).
Overview:
Wire update_weights and update_weight_version routes to tokenizer manager
Details:
Adds the basic /engine/ routes needed for sglang weight update support.
Summary by CodeRabbit