Skip to content

Refactor streaming session abort handling#22790

Merged
hnyls2002 merged 7 commits intomainfrom
lsyin/session-abort-rollback
Apr 15, 2026
Merged

Refactor streaming session abort handling#22790
hnyls2002 merged 7 commits intomainfrom
lsyin/session-abort-rollback

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Apr 14, 2026

Streaming session abort refactoring

Streaming sessions had no proper abort handling — abort could corrupt session state, leak KV memory, or crash the server. This PR fixes 5 independent problems and adds comprehensive test coverage.

Design

Abort = wipe KV, keep tokens. When any streaming request is aborted, the session's KV slot is completely released (tokens freed, pool slot returned, tree lock dropped). But the previous turn's token IDs stay in req_nodes, so the next request can re-prefill from the last successful turn and continue the conversation normally.

This replaces the previous behavior where abort tried to preserve partial state — which was fragile because kv_committed_len and kv_allocated_len could be inflated by prepare_for_extend mid-decode, and saving that state back to the slot corrupted future turns. The new design sidesteps all partial-state reasoning: abort wipes everything, the next request rebuilds from the last known good point.

Two high-level shifts:

  1. Abort cleanup is centralized. All abort paths (first-request, nth-request, pre-abort) route through release_session. No more ad-hoc cleanup branches.
  2. req_nodes is the single source of truth for previous token IDs. It holds the last successful turn's token IDs and is updated only on successful finish, never on abort. The next turn's input is always rebuilt from req_nodes, independent of whatever partial KV state the aborted request left behind. Token IDs and KV state are fully decoupled.

Problems & Fixes

P1: create_req popitem destroys append-only semantics; _close misses running request

create_req used req_nodes.popitem() to get the last successful req. If the new request was aborted, req_nodes was left empty — the next request lost all conversation history.

Fix: peek (don't pop) req_nodes. Add finish_req() to update req_nodes only on successful completion. On abort, req_nodes stays pointing at the last successful request so the next turn can still inherit from it.

With this change, req_nodes tracks the last successful request, not the currently running one. _close needs a different signal to detect in-flight requests — introduce Session._inflight as the single source of truth (also used for concurrency control in P2). _close checks _inflight and defers cleanup via the existing close_on_finish mechanism when a request is running.

P2: No concurrency control on streaming sessions

Nothing prevented two concurrent requests on the same streaming session. Both could call restore_to_req and share the same req_pool_idx, corrupting each other's KV memory.

Fix: add Session._inflight flag. Set on create_req, cleared by finish_req/abort_req. Concurrent requests are immediately pre-aborted.

P3: Mid-processing abort saves corrupted state

When /abort_request aborted a decoding request, prepare_for_extend had already inflated kv_committed_len. The existing code still called save_from_req, writing this corrupted state back to the slot. The next turn inherited KV that didn't match its input tokens.

Fix: mid-processing abort = nuke (wipe fully). Call release_session to wipe all KV. The next request re-prefills from scratch (token IDs stay in req_nodes via the finish_req pattern from P1).

P4: First-request abort cleanup incomplete

When the first request on a session was aborted mid-processing, the code created a new slot and called save_from_req with the aborted request's garbage state (inflated lengths, partial decode output). This slot persisted and corrupted subsequent requests.

Fix: create an ephemeral SessionSlot from the request's state (including last_node and cache_protected_len for proper dec_lock_ref), then immediately nuke it via release_session. No slot remains.

P5: Pre-aborted requests enter streaming abort path

Scheduler-level pre-aborts (e.g., input too long) kept req.session set. In cache_finished_req, the streaming abort handler would nuke the session slot even though the request never touched it — and even though another request might be actively using it (from P2 before the fix).

Fix: in match_prefix, when to_finish is already set (pre-abort), detach the request from the session (abort_req() + session = None) before delegating to inner cache. The request is handled as a regular (non-streaming) request. The session slot is completely untouched.

Test coverage

All tests run under SGLANG_ENABLE_STRICT_MEM_CHECK_DURING_BUSY=2 (strict memory accounting).

Unit tests (test_streaming_session_unit.py):

  • test_preabort_detaches_session_and_preserves_slot — match_prefix detaches session, slot untouched
  • test_first_mid_abort_nukes_ephemeral_slot — ephemeral slot created and nuked, no residual
  • test_nth_mid_abort_nukes_session_slot — existing slot nuked via release_session

E2E tests (test_streaming_session.py, inherited by all subclasses):

  • test_preabort_recovery — unsupported offset triggers pre-abort; slot preserved; prompt_tokens verified
  • test_first_mid_abort_recovery — first request aborted mid-decode; prompt_tokens = fresh input only
  • test_nth_mid_abort_recovery — nth request aborted mid-decode; prompt_tokens = rollback to last successful turn

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a rollback mechanism for aborted session requests to maintain session integrity when errors occur during request processing. Changes include tracking the previous request state in the Session class and invoking a rollback in the scheduler's error handling paths. A review comment suggests refactoring the duplicated rollback logic in handle_generate_request into a helper method to improve code maintainability.

Comment thread python/sglang/srt/managers/scheduler.py Outdated
Comment on lines +1907 to +1908
if req.session is not None:
req.session.rollback_aborted_req(req.rid)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for rolling back a session request is duplicated in four places within handle_generate_request. To improve code clarity and maintainability, consider extracting this logic into a helper method.

For example, you could add a method to the Scheduler class:

def _rollback_session_req(self, req: Req):
    if req.session is not None:
        req.session.rollback_aborted_req(req.rid)

Then you could replace this block and the other three identical blocks with a call to self._rollback_session_req(req).

                self._rollback_session_req(req)

@hnyls2002 hnyls2002 force-pushed the lsyin/session-abort-rollback branch from 65b9616 to e38ffbe Compare April 14, 2026 11:52
@github-actions github-actions Bot added documentation Improvements or additions to documentation npu labels Apr 14, 2026
@hnyls2002 hnyls2002 force-pushed the lsyin/session-abort-rollback branch from e38ffbe to ee7f6a2 Compare April 14, 2026 20:38
@hnyls2002 hnyls2002 force-pushed the lsyin/session-abort-rollback branch from 8d36186 to ada7d06 Compare April 14, 2026 20:47
@hnyls2002 hnyls2002 force-pushed the lsyin/session-abort-rollback branch from 0651c59 to 665c53e Compare April 15, 2026 05:20
@hnyls2002
Copy link
Copy Markdown
Collaborator Author

hnyls2002 commented Apr 15, 2026

/rerun-test test_streaming_session.py test_streaming_session_unit.py

(x3)

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/sessions/test_streaming_session.py

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/mem_cache/test_streaming_session_unit.py

@hnyls2002 hnyls2002 changed the title Rollback streaming session req_nodes on abort Refactor streaming session abort handling Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/sessions/test_streaming_session.py

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/mem_cache/test_streaming_session_unit.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/sessions/test_streaming_session.py

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/mem_cache/test_streaming_session_unit.py

@hnyls2002
Copy link
Copy Markdown
Collaborator Author

/rerun-test test_streaming_session.py test_streaming_session_unit.py test_session_control.py test_session_latency.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (3 tests): View workflow run

cd test/ && python3 registered/sessions/test_streaming_session.py
cd test/ && python3 registered/sessions/test_session_control.py
cd test/ && python3 registered/sessions/test_session_latency.py

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/mem_cache/test_streaming_session_unit.py

@hnyls2002 hnyls2002 merged commit aa78564 into main Apr 15, 2026
28 of 43 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/session-abort-rollback branch April 15, 2026 07:13
hnyls2002 added a commit that referenced this pull request Apr 15, 2026
@hnyls2002
Copy link
Copy Markdown
Collaborator Author

/rerun-test test_session_control.py test_session_latency.py test_streaming_session.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation high priority npu run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant