Fix streaming logprobs corruption caused by shared mutable list reference by merrymercy · Pull Request #21030 · sgl-project/sglang

merrymercy · 2026-03-20T12:57:57Z

Summary

Root cause fix for flaky test_completion_stream failures. Log: https://github.com/sgl-project/sglang/actions/runs/23338124262/job/67887146436?pr=20999

======================================================================
ERROR: test_completion_stream (__main__.TestOpenAIServer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/actions-runner/_work/sglang/sglang/python/sglang/srt/utils/common.py", line 2661, in retry
    return fn()
  File "/home/runner/actions-runner/_work/sglang/sglang/python/sglang/test/test_utils.py", line 2084, in <lambda>
    lambda: super(CustomTestCase, self)._callTestMethod(method),
  File "/usr/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/home/runner/actions-runner/_work/sglang/sglang/test/registered/openai_server/basic/test_openai_server.py", line 334, in test_completion_stream
    self.run_completion_stream(
  File "/home/runner/actions-runner/_work/sglang/sglang/test/registered/openai_server/basic/test_openai_server.py", line 168, in run_completion_stream
    response.choices[0].logprobs.tokens[0], str
IndexError: list index out of range

When multiple streaming chunks queue up before the consumer drains them ("streaming backlog"), all chunks' meta_info["output_token_logprobs"] point to the same mutable list in tokenizer_manager.py. Later chunks extend the list, so earlier chunks see logprobs belonging to later chunks. The first chunk "steals" all logprobs; subsequent chunks get empty tokens=[], causing IndexError.
Records output_token_logprobs_length as an immutable int snapshot in meta_info at chunk creation time. Downstream consumers slice with [n_prev:length] instead of [n_prev:], so each chunk sees only its own logprobs regardless of later mutations.
Reverts the workaround from PR Fix flaky streaming logprobs test by handling detokenizer text buffering #17687 which only handled the finish_reason end-of-stream case but missed the mid-stream backlog scenario. Restores the original strict test assertions.

Test plan

test_completion_stream in test_openai_server.py should pass reliably on Blackwell (previously flaky)
test_chat_completion_stream should continue to pass
Non-streaming logprobs endpoints unaffected (they use the final cumulative list)

Made with Cursor

…ence When multiple streaming chunks queue up before the consumer drains them (streaming backlog), all chunks' meta_info["output_token_logprobs"] point to the same list object in tokenizer_manager.py. Later chunks extend the list, causing earlier chunks to see logprobs that belong to later chunks. This makes the first chunk "steal" all logprobs and leaves subsequent chunks with empty logprobs, triggering IndexError in the test. Root fix: record output_token_logprobs_length as an immutable int snapshot in meta_info at chunk creation time. Downstream consumers use this length to slice the shared list correctly, so each chunk sees only its own logprobs regardless of later mutations. This reverts the workaround from PR #17687 which only handled the finish_reason case but missed the mid-stream backlog scenario. Made-with: Cursor

gemini-code-assist · 2026-03-20T12:58:01Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Add the same n_prev_token < total_output_logprobs guard that the chat streaming path already has, so that an empty output logprobs slice does not produce a LogProbs object with tokens=[] (which would crash on tokens[0]). The guard also allows through when input_token_logprobs is present (echo first-chunk case). Made-with: Cursor

merrymercy · 2026-03-20T13:33:21Z

/tag-and-rerun-ci

vladnosiv · 2026-03-20T15:02:30Z


        meta_info["input_token_logprobs"] = state.input_token_logprobs
        meta_info["output_token_logprobs"] = state.output_token_logprobs
+        meta_info["output_token_logprobs_length"] = len(state.output_token_logprobs)


Not mutating the state in general looks clearer and cleaner than leaving a hidden mutation, but adding an immunable fingerprint of the state.
But if there is a risk of too much overhead in allocations for log probs with immutability, then okay

Copying the whole list every step will be very slow, so i decided to use a length

vladnosiv · 2026-03-20T15:04:55Z

-                            output_token_logprobs=output_logprobs_slice,
+                            output_token_logprobs=content["meta_info"][
+                                "output_token_logprobs"
+                            ][n_prev_token:total_output_logprobs],


You can put a tuple with the bounds in meta_info at once, and then you won't need to keep an additional state n_prev_tokens here

This is a reasonable idea.
However, for this PR, i would like to keep the change minimal.
I need to think for more time for this tuple bounds idea + incremental-streaming case. I will follow up in another PR.

…ence (sgl-project#21030)

merrymercy requested review from CatherineSue, JustinTong0323, Ying1123, hnyls2002, ispobock, slin1237 and xiezhq-hermann as code owners March 20, 2026 12:57

github-actions Bot added the run-ci label Mar 20, 2026

merrymercy mentioned this pull request Mar 20, 2026

[Bug] Streaming token ids data loss under load (affects Nvidia Dynamo) #19976

Closed

vladnosiv reviewed Mar 20, 2026

View reviewed changes

merrymercy merged commit dba6fb3 into main Mar 21, 2026
158 of 186 checks passed

merrymercy deleted the fix/streaming-logprobs-shared-ref branch March 21, 2026 07:18

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

Fix streaming logprobs corruption caused by shared mutable list refer…

89b2398

…ence (sgl-project#21030)

dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026

Fix streaming logprobs corruption caused by shared mutable list refer…

4181d9d

…ence (sgl-project#21030)

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

Fix streaming logprobs corruption caused by shared mutable list refer…

2aa23fa

…ence (sgl-project#21030)

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

Fix streaming logprobs corruption caused by shared mutable list refer…

94fa7ff

…ence (sgl-project#21030)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix streaming logprobs corruption caused by shared mutable list reference#21030

Fix streaming logprobs corruption caused by shared mutable list reference#21030
merrymercy merged 2 commits intomainfrom
fix/streaming-logprobs-shared-ref

merrymercy commented Mar 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 20, 2026

Uh oh!

merrymercy commented Mar 20, 2026

Uh oh!

vladnosiv Mar 20, 2026 •

edited

Loading

Uh oh!

merrymercy Mar 21, 2026

Uh oh!

vladnosiv Mar 20, 2026

Uh oh!

merrymercy Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

merrymercy commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 20, 2026

Uh oh!

merrymercy commented Mar 20, 2026

Uh oh!

vladnosiv Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merrymercy Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

vladnosiv Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

merrymercy Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

merrymercy commented Mar 20, 2026 •

edited

Loading

vladnosiv Mar 20, 2026 •

edited

Loading