Skip to content

streaming session: spec v2 bonus accounting + comprehensive test matrix#22651

Merged
hnyls2002 merged 94 commits intomainfrom
lsyin/enable-streaming-retract-tests
Apr 16, 2026
Merged

streaming session: spec v2 bonus accounting + comprehensive test matrix#22651
hnyls2002 merged 94 commits intomainfrom
lsyin/enable-streaming-retract-tests

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Apr 13, 2026

Summary

Summary / capstone PR for the streaming-session correctness work. Two roles:

  1. Index — the PR chain table below catalogues every streaming-session fix landed over the past week, with one-line descriptions and pointers to the PRs that carry the full analyses (Streaming session: fix retract tail leak via _free_tail #22862 for tail free, streaming session: trim spec v2 overshoot in cache_finished_req #22897 for overshoot trim + a complete appendix on spec v2 invariants).
  2. Lands the last code fix + the test matrix — spec v2 bonus-slot accounting (the only correctness piece not already in main), removes the spec_v2 + streaming session incompatibility gate, and adds end-to-end test coverage across all meaningful configuration combinations.

Read this PR if you want the overview; follow the links to #22862 / #22897 for the root analyses.

Dependent PR chain

Area PRs One-line
Initial implementation #21875 First-cut streaming session by @ishandhanani, plus the original test suite
Test infra #22668 TestStreamingSessionAbortLeakRepro inherits stdout/stderr instead of tempfile (so server logs surface in CI failure output)
Fix double-counting #22213, #22753 KV / mem accounting double-counted slot-held tokens
Remove dead code #22735 Drop unused branches from the initial implementation
Renaming #22755 Rename to consistent kv_committed_len / kv_allocated_len / slot.kv_* vocabulary
Fix abort handling #22790 First-request and mid-decode abort paths corrupted the slot
Fix tail free on retract #22862 _free_tail partial-page free corrupted committed pages on retract retry. #22862's body has the full analysis of why match_prefix is the right location for the free, why page alignment is required, and how both spec over-allocation and retract alloc-commit gap funnel through the same boundary.
Fix overshoot trim + SWA cap #22897, #22900 spec v2 may overshoot max_new_tokens and SWA cursor leaks; trim the slot at finish. #22897's body includes a full appendix with the root analysis (definitions, case analysis, how the fixes compose) that also covers the bonus-accounting fix below.

Bonus accounting fix (this PR)

Spec v2 (overlap + EAGLE) bumps kv_committed_len post-facto in _resolve_spec_overlap_token_ids:

committed += accept_lens   # accept_lens = num_drafts + 1 bonus

Normal decode does the +1 upfront in prepare_for_decode (claims the pending-bonus slot before the forward runs). Spec v2 doing it post-facto leaves a one-token race: on the finishing round, cache_finished_req fires before the next iter's resolve catches up, so save_from_req captures committed = origin + finished_len - 1. Next turn inherits one token short, and the EagleV2 inheritance test fails by 1.

Fix mirrors the normal-decode pattern:

  • EagleDraftInput.prepare_for_decode: r.kv_committed_len += 1 (pre-claim bonus slot)
  • _resolve_spec_overlap_token_ids: req.kv_committed_len += accept_lens[i] - 1 (subtract the upfront claim)

Net per round is unchanged (+= accept_lens); the timing moves from post-facto to upfront, so cache_finished_req always sees the post-extra-round committed value.

Also removes the spec v2 + streaming-session incompatibility gate in tokenizer_communicator_mixin.py. The accounting bugs that motivated the gate are fixed by the chain above (#22790, #22862, #22897, #22900, and this PR); spec v2 + streaming + retract now passes under strict mem check.

Test coverage

What this PR adds

12 test classes across two files, pruned by strict-superset coverage — each retract variant already exercises the non-retract path; page=256 tests subsume page=1 (since page-aligned free / allocator paths only activate at page>1).

Group Classes Config
Llama small TestStreamingSession, RetractMixedChunk, RetractLargePage, AbortLeakRepro --chunked-prefill-size 512 (or tight for abort stress)
Eagle V1 (overlap disabled) Eagle, EagleRetractLargePage EAGLE3 + --disable-overlap-schedule
Eagle V2 (overlap on) EagleV2, EagleV2RetractLargePage EAGLE3 + SGLANG_ENABLE_SPEC_V2=1
SWA (gpt-oss-20b) SWA, SWARetractLargePage, SWARetractMixedChunk, SWAAbortLeakRepro sliding-window attention

All use SGLANG_ENABLE_STRICT_MEM_CHECK_DURING_BUSY=2 so any KV accounting drift fails the test.

Per-class shared methods

Every subclass of TestStreamingSession inherits and runs:

  • test_kv_cache_inheritance — verify cached_tokens carries forward correctly across turns (skipped on page > 1 classes; constant offset doesn't fit when cached_tokens rounds to page boundaries)
  • test_leak_logprob_concurrent — concurrent multi-session with logprob, watch for KV leak
  • test_stress_concurrent_sessions — high-concurrency streaming + non-streaming
  • test_preabort_recovery — pre-abort (rejected by create_req) doesn't corrupt session
  • test_first_mid_abort_recovery — abort the very first request mid-decode, recovery starts fresh
  • test_nth_mid_abort_recovery — abort an nth request mid-decode, recovery rolls back to last successful

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the TestStreamingSessionRetract and TestStreamingSessionRetractMixedChunk test classes by removing the unittest skip decorators that were previously applied due to a token leak. I have no feedback to provide.

@sgl-project sgl-project deleted a comment from github-actions Bot Apr 13, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 13, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 13, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 13, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 13, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 13, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 13, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 13, 2026
@hnyls2002
Copy link
Copy Markdown
Collaborator Author

hnyls2002 commented Apr 13, 2026

/rerun-test registered/sessions/test_streaming_session.py registered/sessions/test_session_control.py

(3 tries)

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (2 tests): View workflow run

cd test/ && python3 registered/sessions/test_streaming_session.py
cd test/ && python3 registered/sessions/test_session_control.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (2 tests): View workflow run

cd test/ && python3 registered/sessions/test_streaming_session.py
cd test/ && python3 registered/sessions/test_session_control.py

@hnyls2002
Copy link
Copy Markdown
Collaborator Author

hnyls2002 commented Apr 15, 2026

/rerun-test test_session_control.py test_session_latency.py test_streaming_session.py test_streaming_session_swa.py

(x2)

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (4 tests): View workflow run

cd test/ && python3 registered/sessions/test_session_control.py
cd test/ && python3 registered/sessions/test_session_latency.py
cd test/ && python3 registered/sessions/test_streaming_session.py
cd test/ && python3 registered/sessions/test_streaming_session_swa.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (4 tests): View workflow run

cd test/ && python3 registered/sessions/test_session_control.py
cd test/ && python3 registered/sessions/test_session_latency.py
cd test/ && python3 registered/sessions/test_streaming_session.py
cd test/ && python3 registered/sessions/test_streaming_session_swa.py

@hnyls2002 hnyls2002 changed the title enable streaming session retract tests streaming session: spec v2 bonus accounting + comprehensive test matrix Apr 15, 2026
@hnyls2002 hnyls2002 merged commit a4cf2ea into main Apr 16, 2026
85 of 110 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/enable-streaming-retract-tests branch April 16, 2026 00:12
jmamou pushed a commit to jmamou/sglang that referenced this pull request Apr 20, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026
kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant