Skip to content

Fix streaming session busy-check double-counting via active_pool_idxs#22753

Merged
hnyls2002 merged 3 commits intomainfrom
lsyin/session-double-count-fix
Apr 14, 2026
Merged

Fix streaming session busy-check double-counting via active_pool_idxs#22753
hnyls2002 merged 3 commits intomainfrom
lsyin/session-double-count-fix

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Apr 14, 2026

Background

This is part of the streaming session memory accounting fix series (#22651). All streaming session tests pass on H200 with strict busy check (SGLANG_ENABLE_STRICT_MEM_CHECK_DURING_BUSY=2), including retract, abort, speculative (eagle v1/v2), and SWA variants. Zero leak assertions across all runs.

Problem

During a streaming session's borrow period (restore_to_req -> forward -> save_from_req), both session_held_tokens() and _get_total_uncached_sizes() claim the same KV pages:

  • session_held_tokens() sees slot.is_holding_kv == True and counts allocated - protected
  • _get_total_uncached_sizes() iterates batch reqs and counts allocated - protected again

This makes total_accounted > total, triggering a false-positive assertion under SGLANG_ENABLE_STRICT_MEM_CHECK_DURING_BUSY=2.

PR #22213 introduced SessionSlot.is_active as a boolean flag (restore_to_req sets True, save_from_req sets False) to filter out active slots. However, retract / abort / speculative-v2-overlap paths break the lifecycle pairing -- the flag can get stuck in the wrong state.

Fix

Replace the cached flag with a per-check recomputation from the scheduler's batch state:

def _active_pool_idxs(self) -> set:
    """Pool idxs currently owned by batch reqs."""
    idxs = set()
    for batch in [self.last_batch, self.running_batch]:
        if batch is None or batch.is_empty():
            continue
        for req in batch.reqs:
            if req.req_pool_idx is not None:
                idxs.add(req.req_pool_idx)
    return idxs

session_held_tokens(active_pool_idxs) excludes slots whose req_pool_idx is in this set. Single source of truth -- no lifecycle pairing required. New scheduler paths (retract, abort, spec) don't need to know about this check.

After this change is_active has zero readers, so the field and its two write sites are deleted.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@hnyls2002
Copy link
Copy Markdown
Collaborator Author

/rerun-test test_streaming_session.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/sessions/test_streaming_session.py

@hnyls2002 hnyls2002 merged commit 0cb7295 into main Apr 14, 2026
57 of 65 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/session-double-count-fix branch April 14, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant