Fix num running requests (load) wrong cleared for ongoing requests by fzyzcjy · Pull Request #15116 · sgl-project/sglang

fzyzcjy · 2025-12-14T13:54:45Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

This reverts commit 7e14127.

…nto feat/load_guard

fzyzcjy · 2025-12-14T13:54:54Z

/tag-and-rerun-ci

gemini-code-assist · 2025-12-14T13:55:13Z

Summary of Changes

Hello @fzyzcjy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug where the count of running requests (worker load) was not being correctly managed, particularly for ongoing requests. It introduces a robust RAII-based mechanism, WorkerLoadGuardV2, to automatically track worker load throughout the lifecycle of a request, ensuring that the load count is always accurate. Concurrently, it centralizes all circuit breaker metric reporting within the CircuitBreaker component, enhancing observability and maintainability. These changes lead to more reliable load balancing and circuit breaking behavior within the system.

Highlights

Refactored Worker Load Management: Introduced WorkerLoadGuardV2, an RAII (Resource Acquisition Is Initialization) guard, to automatically increment and decrement worker load, ensuring accurate tracking of ongoing requests throughout their lifecycle.
Centralized Circuit Breaker Metrics: All RouterMetrics calls related to circuit breaker state, outcomes, and transitions have been moved into the CircuitBreaker struct itself, improving consistency, reducing redundancy, and enabling per-worker metric labeling.
Removed Periodic Load Reset: The previous periodic reset mechanism for worker loads in the WorkerRegistry has been eliminated, as the new RAII guard handles load tracking on a granular, per-request basis, making the system more reactive and accurate.
Improved Streaming Load Handling: The logic for decrementing load for streaming requests has been simplified by leveraging the WorkerLoadGuardV2 and setting it to None upon stream completion, ensuring load is correctly released even for long-lived connections.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request provides an excellent fix for a critical bug in tracking the number of running requests, especially for ongoing streaming requests. The introduction of the WorkerLoadGuardV2 RAII guard is a significant improvement, making load management much more robust and less error-prone by ensuring increment_load and decrement_load are always correctly paired, even across error paths and different response types. This change has led to cleaner, more maintainable code in router.rs and has correctly allowed for the removal of the periodic load reset workaround in worker_registry.rs. Additionally, the refactoring of circuit breaker metrics into the CircuitBreaker struct itself is a great design choice that improves encapsulation. Overall, these changes substantially enhance the correctness and maintainability of the codebase.

I am having trouble creating individual review comments. Click here to see my feedback.

sgl-model-gateway/src/core/circuit_breaker.rs (316-320)

Creating the publish_gauge_metrics helper function is a great refactoring. It centralizes the logic for publishing gauge metrics, reduces code duplication in record_outcome, transition_to, and reset, and makes it easier to manage which metrics are published and when. The TODO comment about potentially publishing on every change is noted, but the current approach of publishing on key events seems like a reasonable and efficient balance.

sgl-model-gateway/src/core/worker.rs (1037-1053)

The introduction of WorkerLoadGuardV2 is an excellent application of the RAII pattern. Using a guard whose Drop implementation decrements the load counter is a much more robust and idiomatic Rust approach than manually tracking and decrementing the load. This will prevent entire classes of bugs related to incorrect load counting, especially in complex scenarios involving streaming and error handling. The TODO comment about migrating fully to V2 is also noted and appreciated.

sgl-model-gateway/src/core/worker_registry.rs (462-469)

Removing the periodic load reset logic is a fantastic improvement. This kind of reset is often a workaround for incorrect state management. Its removal, along with the associated counter variables, demonstrates confidence that the root cause of the load counting issue has been properly fixed with the new WorkerLoadGuardV2, leading to a more correct and reliable system.

sgl-model-gateway/src/routers/http/router.rs (237-238)

Replacing the manual load_incremented boolean flag with the WorkerLoadGuardV2 RAII guard greatly improves the code's correctness and readability. This ensures that the load is always decremented when the guard goes out of scope, simplifying error handling and logic flow.

…n_eagle3_npu * 'main' of https://github.com/sgl-project/sglang: (89 commits) [model-gateway] Remove legacy RouterMetrics and Rename SmgMetrics to Metrics and smg_labels to metrics_labels (sgl-project#15160) [diffusion] fix: fix video model sp when resolution is not specified (sgl-project#15047) [diffusion] fix: fix pytorch non-writable array warning (sgl-project#15017) [diffusion] fix: cache dit with parallel (sgl-project#15163) chore: change npu pr-test a2 runner (sgl-project#15152) [Feature] Fuse mrope all in 1 kernel (sgl-project#14906) Fix num running requests (load) wrong cleared for ongoing requests (sgl-project#15116) Fused two elementwise kernels for k_nope and k_pe concat (sgl-project#14862) fix: adding date and fixing release name issue (sgl-project#15174) [CPU] Add Gemma3RMSNorm kernel in sgl-kernel and add ut (sgl-project#9324) feature: PR wheel (sgl-project#15170) [diffusion] model: support mutli-image input and qwen-image-edit-2509 (sgl-project#15005) fix CompressedTensorsW8A8Int8 min_capability (sgl-project#13914) Tiny improve summary text in `bench_one_batch_server.py` (sgl-project#15158) [model-gateway] add mcp and discovery metrics (sgl-project#15156) fix: move ci-bot (sgl-project#15154) Fix import warnings (sgl-project#15144) ci: adding errors to Github summary (sgl-project#14778) [model-gateway] Add streaming metrics for harmony gRPC router (sgl-project#15147) [model-gateway] upgrade axum and axum server (sgl-project#15146) ... # Conflicts: # python/sglang/srt/server_args.py

…gl-project#15116)

fzyzcjy added 30 commits December 14, 2025 13:36

more

dd6cb58

more

328731b

more

d6f6b4a

Merge branch 'main-upstream' into feat/cb_refactor

48ea95c

fmt

f604d41

fmt

d04e9a7

more

7ba0dd7

more

8293ccc

more

310927b

more

f4bbf06

more

3b73143

more

48240cf

more

2fdc184

fmt

07289e6

more

93eee62

more

267427b

more

eb60450

more

0658662

more

4c745e1

more

9b9fb5c

more

6d0ec85

more

863e139

more

f93ba03

more

c6d4482

more

2a21cc0

more

25e169a

more

8c793c7

bump ci

7e14127

Revert "bump ci"

6083bc3

This reverts commit 7e14127.

more

bb74879

fzyzcjy and others added 6 commits December 14, 2025 19:52

more

bd56e76

Merge branch 'main' into feat/load_guard

9ac9bfd

fix init cb state

74f4a10

try fix load_guard too early drop

ec0f615

Merge branch 'feat/load_guard' of https://github.com/fzyzcjy/sglang i…

75c2453

…nto feat/load_guard

more

2179deb

github-actions Bot added model-gateway run-ci labels Dec 14, 2025

gemini-code-assist Bot reviewed Dec 14, 2025

View reviewed changes

fzyzcjy marked this pull request as ready for review December 14, 2025 14:39

fzyzcjy requested review from CatherineSue, key4ng and slin1237 as code owners December 14, 2025 14:39

fzyzcjy added 2 commits December 14, 2025 22:39

Merge branch 'main' into feat/rm_reset_load

fa7b335

Merge branch 'main' into feat/rm_reset_load

35685b4

slin1237 approved these changes Dec 15, 2025

View reviewed changes

fzyzcjy merged commit 89ad390 into sgl-project:main Dec 15, 2025
59 checks passed

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 17, 2025

Fix num running requests (load) wrong cleared for ongoing requests (s…

6557326

…gl-project#15116)

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

Fix num running requests (load) wrong cleared for ongoing requests (s…

2602edf

…gl-project#15116)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix num running requests (load) wrong cleared for ongoing requests#15116

Fix num running requests (load) wrong cleared for ongoing requests#15116
fzyzcjy merged 38 commits intosgl-project:mainfrom
fzyzcjy:feat/rm_reset_load

fzyzcjy commented Dec 14, 2025

Uh oh!

fzyzcjy commented Dec 14, 2025

Uh oh!

gemini-code-assist Bot commented Dec 14, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fzyzcjy commented Dec 14, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

fzyzcjy commented Dec 14, 2025

Uh oh!

gemini-code-assist Bot commented Dec 14, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

sgl-model-gateway/src/core/circuit_breaker.rs (316-320)

sgl-model-gateway/src/core/worker.rs (1037-1053)

sgl-model-gateway/src/core/worker_registry.rs (462-469)

sgl-model-gateway/src/routers/http/router.rs (237-238)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants