Fix token leak with logprob_start_len=0 in streaming sessions by YazhiGao · Pull Request #20557 · sgl-project/sglang

YazhiGao · 2026-03-13T23:50:16Z

Motivation

When logprob_start_len=0 is used with streaming sessions, init_next_round_input truncates the prefix match key to length 0. For streaming sessions, SessionAwareCache.match_prefix then computes prefix_len = min(kv_committed_len, 0) = 0, bypassing the session slot's committed KV and orphaning allocated tokens. This causes a token memory leak that crashes the server.

Modifications

In SessionAwareCache.match_prefix, use fill_ids length (the full token history for the session) instead of the truncated key to determine the prefix match length. Also clamp logprob_start_len up to the prefix length so the scheduler doesn't expect logprobs for tokens already in the session's committed KV.

Also adds test_session_logprob_leak.py covering sessions with:

No logprobs (health check only)
Output logprobs only (return_logprob=True, default logprob_start_len=-1)
Input logprobs (logprob_start_len=0)

Each test includes bitwise-exact logprob comparison against a non-session baseline.

Checklist

Format code
Add unit test

When logprob_start_len=0, init_next_round_input truncates the prefix match key to length 0. For streaming sessions this bypasses the slot's committed KV, orphaning allocated tokens. Fix: in SessionAwareCache.match_prefix, use fill_ids length instead of the truncated key to determine the prefix, and clamp logprob_start_len to the prefix length so the scheduler doesn't expect logprobs for tokens already in the session's committed KV. Also adds test_session_logprob_leak.py covering sessions with various logprob configurations and bitwise-exact logprob comparison against non-session baseline.

gemini-code-assist · 2026-03-13T23:50:20Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…oject#20557)

YazhiGao requested review from Ying1123, hanming-lu, hnyls2002, hzh0425, ispobock, merrymercy, xiezhq-hermann and yizhang2077 as code owners March 13, 2026 23:50

hnyls2002 added 2 commits March 19, 2026 14:22

fix lint

753eec8

do not compare

c22aa14

hnyls2002 merged commit 63c38ab into sgl-project:main Mar 19, 2026
63 of 72 checks passed

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

Fix token leak with logprob_start_len=0 in streaming sessions (sgl-pr…

a199204

…oject#20557)

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

Fix token leak with logprob_start_len=0 in streaming sessions (sgl-pr…

336c63d

…oject#20557)

dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026

Fix token leak with logprob_start_len=0 in streaming sessions (sgl-pr…

24b16f3

…oject#20557)

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

Fix token leak with logprob_start_len=0 in streaming sessions (sgl-pr…

7378ec6

…oject#20557)

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

Fix token leak with logprob_start_len=0 in streaming sessions (sgl-pr…

6af5616

…oject#20557)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix token leak with logprob_start_len=0 in streaming sessions#20557

Fix token leak with logprob_start_len=0 in streaming sessions#20557
hnyls2002 merged 3 commits intosgl-project:mainfrom
YazhiGao:fix-session-logprob-leak

YazhiGao commented Mar 13, 2026

Uh oh!

gemini-code-assist Bot commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YazhiGao commented Mar 13, 2026

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist Bot commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants