Skip to content

Fix streaming logprobs corruption caused by shared mutable list reference#21030

Merged
merrymercy merged 2 commits intomainfrom
fix/streaming-logprobs-shared-ref
Mar 21, 2026
Merged

Fix streaming logprobs corruption caused by shared mutable list reference#21030
merrymercy merged 2 commits intomainfrom
fix/streaming-logprobs-shared-ref

Conversation

@merrymercy
Copy link
Copy Markdown
Contributor

@merrymercy merrymercy commented Mar 20, 2026

Summary

======================================================================
ERROR: test_completion_stream (__main__.TestOpenAIServer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/actions-runner/_work/sglang/sglang/python/sglang/srt/utils/common.py", line 2661, in retry
    return fn()
  File "/home/runner/actions-runner/_work/sglang/sglang/python/sglang/test/test_utils.py", line 2084, in <lambda>
    lambda: super(CustomTestCase, self)._callTestMethod(method),
  File "/usr/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/home/runner/actions-runner/_work/sglang/sglang/test/registered/openai_server/basic/test_openai_server.py", line 334, in test_completion_stream
    self.run_completion_stream(
  File "/home/runner/actions-runner/_work/sglang/sglang/test/registered/openai_server/basic/test_openai_server.py", line 168, in run_completion_stream
    response.choices[0].logprobs.tokens[0], str
IndexError: list index out of range
  • When multiple streaming chunks queue up before the consumer drains them ("streaming backlog"), all chunks' meta_info["output_token_logprobs"] point to the same mutable list in tokenizer_manager.py. Later chunks extend the list, so earlier chunks see logprobs belonging to later chunks. The first chunk "steals" all logprobs; subsequent chunks get empty tokens=[], causing IndexError.
  • Records output_token_logprobs_length as an immutable int snapshot in meta_info at chunk creation time. Downstream consumers slice with [n_prev:length] instead of [n_prev:], so each chunk sees only its own logprobs regardless of later mutations.
  • Reverts the workaround from PR Fix flaky streaming logprobs test by handling detokenizer text buffering #17687 which only handled the finish_reason end-of-stream case but missed the mid-stream backlog scenario. Restores the original strict test assertions.

Test plan

  • test_completion_stream in test_openai_server.py should pass reliably on Blackwell (previously flaky)
  • test_chat_completion_stream should continue to pass
  • Non-streaming logprobs endpoints unaffected (they use the final cumulative list)

Made with Cursor

…ence

When multiple streaming chunks queue up before the consumer drains them
(streaming backlog), all chunks' meta_info["output_token_logprobs"] point
to the same list object in tokenizer_manager.py. Later chunks extend the
list, causing earlier chunks to see logprobs that belong to later chunks.
This makes the first chunk "steal" all logprobs and leaves subsequent
chunks with empty logprobs, triggering IndexError in the test.

Root fix: record output_token_logprobs_length as an immutable int snapshot
in meta_info at chunk creation time. Downstream consumers use this length
to slice the shared list correctly, so each chunk sees only its own
logprobs regardless of later mutations.

This reverts the workaround from PR #17687 which only handled the
finish_reason case but missed the mid-stream backlog scenario.

Made-with: Cursor
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Add the same n_prev_token < total_output_logprobs guard that the chat
streaming path already has, so that an empty output logprobs slice
does not produce a LogProbs object with tokens=[] (which would crash
on tokens[0]). The guard also allows through when input_token_logprobs
is present (echo first-chunk case).

Made-with: Cursor
@merrymercy
Copy link
Copy Markdown
Contributor Author

/tag-and-rerun-ci


meta_info["input_token_logprobs"] = state.input_token_logprobs
meta_info["output_token_logprobs"] = state.output_token_logprobs
meta_info["output_token_logprobs_length"] = len(state.output_token_logprobs)
Copy link
Copy Markdown
Contributor

@vladnosiv vladnosiv Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not mutating the state in general looks clearer and cleaner than leaving a hidden mutation, but adding an immunable fingerprint of the state.
But if there is a risk of too much overhead in allocations for log probs with immutability, then okay

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying the whole list every step will be very slow, so i decided to use a length

output_token_logprobs=output_logprobs_slice,
output_token_logprobs=content["meta_info"][
"output_token_logprobs"
][n_prev_token:total_output_logprobs],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put a tuple with the bounds in meta_info at once, and then you won't need to keep an additional state n_prev_tokens here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable idea.
However, for this PR, i would like to keep the change minimal.
I need to think for more time for this tuple bounds idea + incremental-streaming case. I will follow up in another PR.

@merrymercy merrymercy merged commit dba6fb3 into main Mar 21, 2026
158 of 186 checks passed
@merrymercy merrymercy deleted the fix/streaming-logprobs-shared-ref branch March 21, 2026 07:18
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants