Fix streaming logprobs corruption caused by shared mutable list reference#21030
Fix streaming logprobs corruption caused by shared mutable list reference#21030merrymercy merged 2 commits intomainfrom
Conversation
…ence When multiple streaming chunks queue up before the consumer drains them (streaming backlog), all chunks' meta_info["output_token_logprobs"] point to the same list object in tokenizer_manager.py. Later chunks extend the list, causing earlier chunks to see logprobs that belong to later chunks. This makes the first chunk "steal" all logprobs and leaves subsequent chunks with empty logprobs, triggering IndexError in the test. Root fix: record output_token_logprobs_length as an immutable int snapshot in meta_info at chunk creation time. Downstream consumers use this length to slice the shared list correctly, so each chunk sees only its own logprobs regardless of later mutations. This reverts the workaround from PR #17687 which only handled the finish_reason case but missed the mid-stream backlog scenario. Made-with: Cursor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Add the same n_prev_token < total_output_logprobs guard that the chat streaming path already has, so that an empty output logprobs slice does not produce a LogProbs object with tokens=[] (which would crash on tokens[0]). The guard also allows through when input_token_logprobs is present (echo first-chunk case). Made-with: Cursor
|
/tag-and-rerun-ci |
|
|
||
| meta_info["input_token_logprobs"] = state.input_token_logprobs | ||
| meta_info["output_token_logprobs"] = state.output_token_logprobs | ||
| meta_info["output_token_logprobs_length"] = len(state.output_token_logprobs) |
There was a problem hiding this comment.
Not mutating the state in general looks clearer and cleaner than leaving a hidden mutation, but adding an immunable fingerprint of the state.
But if there is a risk of too much overhead in allocations for log probs with immutability, then okay
There was a problem hiding this comment.
Copying the whole list every step will be very slow, so i decided to use a length
| output_token_logprobs=output_logprobs_slice, | ||
| output_token_logprobs=content["meta_info"][ | ||
| "output_token_logprobs" | ||
| ][n_prev_token:total_output_logprobs], |
There was a problem hiding this comment.
You can put a tuple with the bounds in meta_info at once, and then you won't need to keep an additional state n_prev_tokens here
There was a problem hiding this comment.
This is a reasonable idea.
However, for this PR, i would like to keep the change minimal.
I need to think for more time for this tuple bounds idea + incremental-streaming case. I will follow up in another PR.
Summary
test_completion_streamfailures. Log: https://github.com/sgl-project/sglang/actions/runs/23338124262/job/67887146436?pr=20999meta_info["output_token_logprobs"]point to the same mutable list intokenizer_manager.py. Later chunks extend the list, so earlier chunks see logprobs belonging to later chunks. The first chunk "steals" all logprobs; subsequent chunks get emptytokens=[], causingIndexError.output_token_logprobs_lengthas an immutable int snapshot inmeta_infoat chunk creation time. Downstream consumers slice with[n_prev:length]instead of[n_prev:], so each chunk sees only its own logprobs regardless of later mutations.finish_reasonend-of-stream case but missed the mid-stream backlog scenario. Restores the original strict test assertions.Test plan
test_completion_streamintest_openai_server.pyshould pass reliably on Blackwell (previously flaky)test_chat_completion_streamshould continue to passMade with Cursor