[Bug Fix] Fix non-streaming request abort failure when --enable-metrics is enabled by fanghao566 · Pull Request #20625 · sgl-project/sglang

fanghao566 · 2026-03-15T11:40:38Z

Motivation

When --enable-metrics is enabled, non-streaming requests to /v1/chat/completions are
not aborted when the client disconnects (e.g., curl cancelled with Ctrl+C). Without
--enable-metrics, abort works correctly.

Root Cause

add_prometheus_track_response_middleware() uses @app.middleware("http"), which is
backed by Starlette's BaseHTTPMiddleware. Its call_next() replaces the ASGI receive
callable with an internal channel, breaking request.is_disconnected() in downstream
route handlers. The disconnect detection in tokenizer_manager.py _wait_one_response()
relies on request.is_disconnected(), which always returns False when
BaseHTTPMiddleware is in the middleware stack.

This is a known Starlette issue:

Checking if a request is disconnected fails when using `BaseHTTPMiddleware`. Kludex/starlette#2094

Steps to Reproduce

Launch server with --enable-metrics
Send a non-streaming request: curl -X POST http://localhost:30000/v1/chat/completions -d '...'
Cancel curl with Ctrl+C while the request is processing
Observe: server does NOT abort the request (no "Abort" in logs)
Repeat without --enable-metrics: server correctly aborts the request

Fixes #20623

Solution

Patch @app.middleware("http") to use a pure ASGI middleware adapter (_PureASGIDispatch)
that provides a fixed call_next. The fixed call_next passes ASGI receive through
untouched instead of replacing it, so request.is_disconnected() keeps working.

The upstream dispatch function (track_http_status_code) is not modified — only the
call_next it receives is replaced. Any future changes to the dispatch function body
(e.g., new counters, routing keys in v0.5.9+) work automatically without updating the patch.

Changes

File	Change
`python/sglang/srt/utils/http_middleware_patch.py`	New file — Pure ASGI `call_next` adapter + `patch_app_http_middleware()`
`python/sglang/srt/utils/common.py`	+3 lines — comment + import + call before `@app.middleware("http")`
`test/registered/scheduler/test_abort_with_metrics.py`	New file — Unit tests for abort + metrics with `--enable-metrics`

Test Plan

test_abort_non_streaming_with_metrics — Launch server with --enable-metrics,
send 16 concurrent non-streaming requests in a subprocess, terminate the subprocess
to simulate client disconnect, verify server logs contain "Abort"
test_metrics_still_work — Launch server with --enable-metrics, send a normal
request, verify /metrics endpoint returns sglang:http_requests_total and
sglang:http_responses_total counters
Existing test_abort.py tests still pass

Run tests:

python test/registered/scheduler/test_abort_with_metrics.py -v

gemini-code-assist · 2026-03-15T11:40:56Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a significant issue preventing non-streaming requests from being properly aborted by the server when Prometheus metrics were active. The solution involves a targeted patch to the HTTP middleware, which re-establishes the correct propagation of client disconnect signals, thereby restoring the expected abort behavior without negatively impacting the existing metrics collection system.

Highlights

Bug Fix for Request Abort: Addressed a critical bug where non-streaming requests to /v1/chat/completions were not aborted upon client disconnection when the --enable-metrics flag was enabled.
Middleware Patch Implementation: Implemented a pure ASGI middleware adapter (_PureASGIDispatch) to patch Starlette's @app.middleware("http"). This ensures the ASGI receive callable is passed through untouched, restoring request.is_disconnected() functionality.
Comprehensive Test Coverage: Introduced new unit tests (test_abort_with_metrics.py) to specifically validate that non-streaming requests are correctly aborted with metrics enabled and to confirm that Prometheus metrics collection remains functional after the patch.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/srt/utils/common.py
- Added an import for patch_app_http_middleware and called it within add_prometheus_track_response_middleware to apply the fix.
python/sglang/srt/utils/http_middleware_patch.py
- Introduced a new module containing _PureASGIDispatch and patch_app_http_middleware to fix the middleware behavior.
test/registered/scheduler/test_abort_with_metrics.py
- Added new tests to verify non-streaming abort with metrics and confirm metrics functionality.

Activity

Identified and reproduced a bug where non-streaming requests were not aborted when --enable-metrics was enabled, linking to issue [Bug] Non-streaming requests are not aborted on client disconnect when --enable-metrics is set #20623.
Developed a solution involving a pure ASGI middleware adapter to fix the request.is_disconnected() issue caused by BaseHTTPMiddleware.
Implemented comprehensive unit tests, including test_abort_non_streaming_with_metrics and test_metrics_still_work, to validate the fix and ensure no regressions.
Confirmed that existing test_abort.py tests continue to pass.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request provides a well-designed fix for an issue where non-streaming requests were not being aborted correctly when metrics are enabled. The approach of patching Starlette's HTTP middleware to prevent it from replacing the ASGI receive callable is clever and effectively resolves the problem. The new tests are thorough, covering both the abort functionality and ensuring that metrics collection remains operational. My review includes a couple of minor suggestions for improving code quality in the new files.

…cs is enabled @app.middleware("http") uses Starlette's BaseHTTPMiddleware whose call_next() replaces the ASGI `receive` callable. This breaks request.is_disconnected() in downstream handlers, preventing non-streaming request abort on client disconnect when --enable-metrics is set. Add a pure ASGI middleware adapter (http_middleware_patch.py) that provides a fixed call_next passing `receive` through untouched. The upstream dispatch function body is not modified — only the call_next it receives is replaced. Fixes sgl-project#20623

sufeng-buaa · 2026-03-17T12:05:59Z

LGTM. I have verified it locally, and it’s OK.

sufeng-buaa · 2026-03-17T12:06:19Z

/tag-and-rerun-ci

fanghao566 · 2026-03-18T02:13:59Z

Hi, the CI failures in the latest run are all unrelated to this PR's changes (HTTP middleware patch for non-streaming abort). Here's a breakdown:

Failed Check	Cause
`stage-b-test-small-1-gpu (1)`	pip install OSError — `cuda_bindings-12.9.6.dist-info/METADATA` not found on runner
`stage-b-test-large-1-gpu (8)`	Flaky `registered/lora/test_lora_update.py`
`per-commit-1-npu-a2 (1)`	Ascend NPU test `test_ascend_autoround_moe.py` failure
`multimodal-gen-test-8-npu-a3`	Ascend NPU `test_diffusion_generation` performance assertion max retry exceeded
`wait-for-stage-b`	Cascading from `stage-b-test-small-1-gpu (1)`
`pr-test-finish`	Cascading from `wait-for-stage-b`

None of the failed tests are related to the files changed in this PR. Could a maintainer please re-run the failed checks? Thanks!

sufeng-buaa · 2026-03-18T02:38:30Z

Hi, the CI failures in the latest run are all unrelated to this PR's changes (HTTP middleware patch for non-streaming abort). Here's a breakdown:

stage-b-test-small-1-gpu (1) was caused by other commits on the main branch. Try waiting for half a day and then rebasing onto the latest code to see if that helps. Other failed test cases can be retried multiple times.

fanghao566 · 2026-03-21T00:28:45Z

@sufeng-buaa Hi, I've rebased and re-run CI a few times now, but the failures keep coming from unrelated tests. The main branch CI itself seems unstable recently. Could you help re-run the failed checks or merge this PR directly if the failures are confirmed unrelated? Thanks!

sufeng-buaa · 2026-03-22T05:20:28Z

@sufeng-buaa Hi, I've rebased and re-run CI a few times now, but the failures keep coming from unrelated tests. The main branch CI itself seems unstable recently. Could you help re-run the failed checks or merge this PR directly if the failures are confirmed unrelated? Thanks!

I don’t have permission to merge directly unless all CI checks have fully passed. You can ask @hnyls2002 to see whether it can be merged directly. I can help rerun the failed cases a few more times.

hnyls2002 · 2026-03-22T05:22:35Z

@sufeng-buaa @fanghao566 We were in the CI maintenance yesterday. And I believe at this moment, all CUDA CI is good.

See #21065

hnyls2002 · 2026-03-22T05:38:38Z

+)
+
+register_cuda_ci(est_time=180, suite="stage-b-test-small-1-gpu")
+register_amd_ci(est_time=300, suite="stage-b-test-small-1-gpu-amd")


common features you do not register to AMD

hnyls2002

Test: prefer mock over launching a real server

The test in test_abort_with_metrics.py launches a full server with model loading (est ~180s) to verify a middleware-level ASGI receive passthrough fix. This can be tested much faster and more precisely with unittest.mock:

Construct a fake ASGI app that checks request.is_disconnected() inside the handler
Wrap it with _PureASGIDispatch
Simulate disconnect by making receive return {"type": "http.disconnect"}
Assert is_disconnected() returns True

This directly tests the fix (ASGI receive passthrough) without needing inference, model loading, or GPU time. The existing test_abort.py already covers the full e2e abort flow.

Also:

Remove register_amd_ci — this is a backend-independent test (middleware behavior), no need to duplicate across backends
Use setUpClass / tearDownClass to share the server if an integration test is still desired

Ref: .claude/skills/write-sglang-test/SKILL.md — rules 4 and 7 (Model & Backend Selection table)

Replace the integration test that launches a real inference server (~180s) with a lightweight unit test that directly exercises _PureASGIDispatch using fake ASGI receive/send callables. Changes: - Test _PureASGIDispatch receive passthrough via mock ASGI pipeline - Remove register_amd_ci (backend-independent middleware test) - Reduce est_time from 180s to 10s

fanghao566 · 2026-03-22T14:45:15Z

Test: prefer mock over launching a real server

The test in test_abort_with_metrics.py launches a full server with model loading (est ~180s) to verify a middleware-level ASGI receive passthrough fix. This can be tested much faster and more precisely with unittest.mock:

Construct a fake ASGI app that checks request.is_disconnected() inside the handler

Wrap it with _PureASGIDispatch

Simulate disconnect by making receive return {"type": "http.disconnect"}

Assert is_disconnected() returns True

This directly tests the fix (ASGI receive passthrough) without needing inference, model loading, or GPU time. The existing test_abort.py already covers the full e2e abort flow.

Also:

Remove register_amd_ci — this is a backend-independent test (middleware behavior), no need to duplicate across backends

Use setUpClass / tearDownClass to share the server if an integration test is still desired

Ref: .claude/skills/write-sglang-test/SKILL.md — rules 4 and 7 (Model & Backend Selection table)

@hnyls2002 Hi, I've updated the test per your feedback:

Replaced the full server integration test with a mock ASGI unit test
Removed register_amd_ci
Reduced est_time from 180s to 10s

Could you please re-review? Thanks!

hnyls2002 · 2026-03-23T02:58:44Z

Only the middleware patch was updated. Basic GPU tests and newly added CPU tests passed.

…cs is enabled (sgl-project#20625)

fanghao566 requested review from CatherineSue, Ying1123, hanming-lu, hnyls2002, hzh0425, ispobock, merrymercy, slin1237, xiezhq-hermann and yizhang2077 as code owners March 15, 2026 11:40

github-actions Bot added the model-gateway label Mar 15, 2026

gemini-code-assist Bot reviewed Mar 15, 2026

View reviewed changes

Comment thread python/sglang/srt/utils/http_middleware_patch.py Outdated

Comment thread test/registered/scheduler/test_abort_with_metrics.py Outdated

fanghao566 force-pushed the fix/non-streaming-abort-with-metrics branch from bb8fdb4 to ce4f55e Compare March 15, 2026 12:02

fanghao566 force-pushed the fix/non-streaming-abort-with-metrics branch from 6534e2a to 938ef42 Compare March 15, 2026 12:58

hnyls2002 assigned sufeng-buaa Mar 15, 2026

github-actions Bot added the run-ci label Mar 17, 2026

fanghao566 added 3 commits March 17, 2026 21:09

fix: add CI registry for test_abort_with_metrics

297796c

Merge branch 'main' into fix/non-streaming-abort-with-metrics

eb71746

Merge branch 'main' into fix/non-streaming-abort-with-metrics

7ebf396

fanghao566 added 2 commits March 18, 2026 20:44

Merge branch 'main' into fix/non-streaming-abort-with-metrics

4c306fa

Merge branch 'main' into fix/non-streaming-abort-with-metrics

5743ab6

sufeng-buaa approved these changes Mar 22, 2026

View reviewed changes

hnyls2002 requested changes Mar 22, 2026

View reviewed changes

move to cpu only

80cd684

hnyls2002 approved these changes Mar 23, 2026

View reviewed changes

Merge branch 'main' into fix/non-streaming-abort-with-metrics

85cd41d

hnyls2002 merged commit 2b47bd3 into sgl-project:main Mar 23, 2026
36 of 76 checks passed

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

[Bug Fix] Fix non-streaming request abort failure when --enable-metri…

c9ebe43

…cs is enabled (sgl-project#20625)

dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026

[Bug Fix] Fix non-streaming request abort failure when --enable-metri…

8fcced6

…cs is enabled (sgl-project#20625)

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

[Bug Fix] Fix non-streaming request abort failure when --enable-metri…

f97f023

…cs is enabled (sgl-project#20625)

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[Bug Fix] Fix non-streaming request abort failure when --enable-metri…

61544ea

…cs is enabled (sgl-project#20625)

Conversation

fanghao566 commented Mar 15, 2026

Motivation

Root Cause

Steps to Reproduce

Solution

Changes

Test Plan

Uh oh!

gemini-code-assist Bot commented Mar 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

sufeng-buaa commented Mar 17, 2026

Uh oh!

sufeng-buaa commented Mar 17, 2026

Uh oh!

fanghao566 commented Mar 18, 2026

Uh oh!

sufeng-buaa commented Mar 18, 2026

Uh oh!

fanghao566 commented Mar 21, 2026

Uh oh!

sufeng-buaa commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hnyls2002 commented Mar 22, 2026

Uh oh!

hnyls2002 Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

hnyls2002 left a comment

Choose a reason for hiding this comment

Test: prefer mock over launching a real server

Uh oh!

fanghao566 commented Mar 22, 2026

Test: prefer mock over launching a real server

Uh oh!

hnyls2002 commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sufeng-buaa commented Mar 22, 2026 •

edited

Loading