Fix streaming session busy-check double-counting via active_pool_idxs#22753
Merged
Fix streaming session busy-check double-counting via active_pool_idxs#22753
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Collaborator
Author
|
/rerun-test test_streaming_session.py |
Contributor
|
✅ |
hnyls2002
added a commit
that referenced
this pull request
Apr 15, 2026
yhyang201
pushed a commit
to yhyang201/sglang
that referenced
this pull request
Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
This is part of the streaming session memory accounting fix series (#22651). All streaming session tests pass on H200 with strict busy check (
SGLANG_ENABLE_STRICT_MEM_CHECK_DURING_BUSY=2), including retract, abort, speculative (eagle v1/v2), and SWA variants. Zero leak assertions across all runs.Problem
During a streaming session's borrow period (
restore_to_req-> forward ->save_from_req), bothsession_held_tokens()and_get_total_uncached_sizes()claim the same KV pages:session_held_tokens()seesslot.is_holding_kv == Trueand countsallocated - protected_get_total_uncached_sizes()iterates batch reqs and countsallocated - protectedagainThis makes
total_accounted > total, triggering a false-positive assertion underSGLANG_ENABLE_STRICT_MEM_CHECK_DURING_BUSY=2.PR #22213 introduced
SessionSlot.is_activeas a boolean flag (restore_to_reqsets True,save_from_reqsets False) to filter out active slots. However, retract / abort / speculative-v2-overlap paths break the lifecycle pairing -- the flag can get stuck in the wrong state.Fix
Replace the cached flag with a per-check recomputation from the scheduler's batch state:
session_held_tokens(active_pool_idxs)excludes slots whosereq_pool_idxis in this set. Single source of truth -- no lifecycle pairing required. New scheduler paths (retract, abort, spec) don't need to know about this check.After this change
is_activehas zero readers, so the field and its two write sites are deleted.