Skip to content

Fix token leak with logprob_start_len=0 in streaming sessions#20557

Merged
hnyls2002 merged 3 commits intosgl-project:mainfrom
YazhiGao:fix-session-logprob-leak
Mar 19, 2026
Merged

Fix token leak with logprob_start_len=0 in streaming sessions#20557
hnyls2002 merged 3 commits intosgl-project:mainfrom
YazhiGao:fix-session-logprob-leak

Conversation

@YazhiGao
Copy link
Copy Markdown
Contributor

Motivation

When logprob_start_len=0 is used with streaming sessions, init_next_round_input truncates the prefix match key to length 0. For streaming sessions, SessionAwareCache.match_prefix then computes prefix_len = min(kv_committed_len, 0) = 0, bypassing the session slot's committed KV and orphaning allocated tokens. This causes a token memory leak that crashes the server.

Modifications

In SessionAwareCache.match_prefix, use fill_ids length (the full token history for the session) instead of the truncated key to determine the prefix match length. Also clamp logprob_start_len up to the prefix length so the scheduler doesn't expect logprobs for tokens already in the session's committed KV.

Also adds test_session_logprob_leak.py covering sessions with:

  • No logprobs (health check only)
  • Output logprobs only (return_logprob=True, default logprob_start_len=-1)
  • Input logprobs (logprob_start_len=0)

Each test includes bitwise-exact logprob comparison against a non-session baseline.

Checklist

  • Format code
  • Add unit test

When logprob_start_len=0, init_next_round_input truncates the prefix
match key to length 0. For streaming sessions this bypasses the slot's
committed KV, orphaning allocated tokens.

Fix: in SessionAwareCache.match_prefix, use fill_ids length instead of
the truncated key to determine the prefix, and clamp logprob_start_len
to the prefix length so the scheduler doesn't expect logprobs for
tokens already in the session's committed KV.

Also adds test_session_logprob_leak.py covering sessions with various
logprob configurations and bitwise-exact logprob comparison against
non-session baseline.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@hnyls2002 hnyls2002 merged commit 63c38ab into sgl-project:main Mar 19, 2026
63 of 72 checks passed
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants