Fix num running requests (load) wrong cleared for ongoing requests#15116
Fix num running requests (load) wrong cleared for ongoing requests#15116fzyzcjy merged 38 commits intosgl-project:mainfrom
Conversation
This reverts commit 7e14127.
|
/tag-and-rerun-ci |
Summary of ChangesHello @fzyzcjy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug where the count of running requests (worker load) was not being correctly managed, particularly for ongoing requests. It introduces a robust RAII-based mechanism, Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request provides an excellent fix for a critical bug in tracking the number of running requests, especially for ongoing streaming requests. The introduction of the WorkerLoadGuardV2 RAII guard is a significant improvement, making load management much more robust and less error-prone by ensuring increment_load and decrement_load are always correctly paired, even across error paths and different response types. This change has led to cleaner, more maintainable code in router.rs and has correctly allowed for the removal of the periodic load reset workaround in worker_registry.rs. Additionally, the refactoring of circuit breaker metrics into the CircuitBreaker struct itself is a great design choice that improves encapsulation. Overall, these changes substantially enhance the correctness and maintainability of the codebase.
I am having trouble creating individual review comments. Click here to see my feedback.
sgl-model-gateway/src/core/circuit_breaker.rs (316-320)
Creating the publish_gauge_metrics helper function is a great refactoring. It centralizes the logic for publishing gauge metrics, reduces code duplication in record_outcome, transition_to, and reset, and makes it easier to manage which metrics are published and when. The TODO comment about potentially publishing on every change is noted, but the current approach of publishing on key events seems like a reasonable and efficient balance.
sgl-model-gateway/src/core/worker.rs (1037-1053)
The introduction of WorkerLoadGuardV2 is an excellent application of the RAII pattern. Using a guard whose Drop implementation decrements the load counter is a much more robust and idiomatic Rust approach than manually tracking and decrementing the load. This will prevent entire classes of bugs related to incorrect load counting, especially in complex scenarios involving streaming and error handling. The TODO comment about migrating fully to V2 is also noted and appreciated.
sgl-model-gateway/src/core/worker_registry.rs (462-469)
Removing the periodic load reset logic is a fantastic improvement. This kind of reset is often a workaround for incorrect state management. Its removal, along with the associated counter variables, demonstrates confidence that the root cause of the load counting issue has been properly fixed with the new WorkerLoadGuardV2, leading to a more correct and reliable system.
sgl-model-gateway/src/routers/http/router.rs (237-238)
Replacing the manual load_incremented boolean flag with the WorkerLoadGuardV2 RAII guard greatly improves the code's correctness and readability. This ensures that the load is always decremented when the guard goes out of scope, simplifying error handling and logic flow.
…n_eagle3_npu * 'main' of https://github.com/sgl-project/sglang: (89 commits) [model-gateway] Remove legacy RouterMetrics and Rename SmgMetrics to Metrics and smg_labels to metrics_labels (sgl-project#15160) [diffusion] fix: fix video model sp when resolution is not specified (sgl-project#15047) [diffusion] fix: fix pytorch non-writable array warning (sgl-project#15017) [diffusion] fix: cache dit with parallel (sgl-project#15163) chore: change npu pr-test a2 runner (sgl-project#15152) [Feature] Fuse mrope all in 1 kernel (sgl-project#14906) Fix num running requests (load) wrong cleared for ongoing requests (sgl-project#15116) Fused two elementwise kernels for k_nope and k_pe concat (sgl-project#14862) fix: adding date and fixing release name issue (sgl-project#15174) [CPU] Add Gemma3RMSNorm kernel in sgl-kernel and add ut (sgl-project#9324) feature: PR wheel (sgl-project#15170) [diffusion] model: support mutli-image input and qwen-image-edit-2509 (sgl-project#15005) fix CompressedTensorsW8A8Int8 min_capability (sgl-project#13914) Tiny improve summary text in `bench_one_batch_server.py` (sgl-project#15158) [model-gateway] add mcp and discovery metrics (sgl-project#15156) fix: move ci-bot (sgl-project#15154) Fix import warnings (sgl-project#15144) ci: adding errors to Github summary (sgl-project#14778) [model-gateway] Add streaming metrics for harmony gRPC router (sgl-project#15147) [model-gateway] upgrade axum and axum server (sgl-project#15146) ... # Conflicts: # python/sglang/srt/server_args.py
Motivation
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist