Fix flaky streaming logprobs test by handling detokenizer text buffering by Kangyan-Zhou · Pull Request #17687 · sgl-project/sglang

Kangyan-Zhou · 2026-01-24T23:56:40Z

Summary

Fixes the flaky test_completion_stream test in test_openai_server.py.

Root cause: The detokenizer holds back text at word boundaries during streaming (via find_printable_text()) to avoid showing incomplete words. On the final chunk when finish_reason is set, this buffered text is flushed. However, by then all logprobs have already been sent in previous chunks, causing the final chunk to have text content but empty logprobs - which broke the test.

Fix: Return None for logprobs when finish_reason is set and all logprobs have been sent. This is semantically correct since no new tokens were generated - the text is just previously buffered text being flushed.

Changes

serving_completions.py: Return None for logprobs on final flush chunk
serving_chat.py: Apply same fix to chat completions streaming endpoint
test_openai_server.py: Update test to handle logprobs=None on final chunk
scheduler_output_processor_mixin.py: Remove debug logging from previous commit

gemini-code-assist · 2026-01-24T23:56:54Z

Summary of Changes

Hello @Kangyan-Zhou, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces targeted debug logging to diagnose a persistent and flaky issue where streaming responses from the OAI server sometimes lack associated log probabilities. By adding detailed warning messages at critical points in both the completion serving and scheduler output processing, the changes aim to provide clearer insights into when and why these logprob discrepancies occur, facilitating a quicker resolution of the underlying problem.

Highlights

Enhanced Debugging for Streaming Logprobs: Added specific warning logs in serving_completions.py to detect instances where text is streamed without corresponding log probabilities, addressing a flaky issue.
Scheduler Logprob Consistency Check: Implemented debug logging in scheduler_output_processor_mixin.py to flag situations where decoded IDs are present but log probability slices are unexpectedly empty during streaming.
Improved Logprob Slicing Clarity: Refactored the calculation and slicing of output token logprobs in serving_completions.py for better readability and explicit variable assignment.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Kangyan-Zhou · 2026-01-24T23:57:13Z

/rerun-stage stage-b-test-small-1-gpu

gemini-code-assist

Code Review

This pull request adds debug logging to help investigate a flaky issue with streaming logprobs in the OpenAI server. The changes in serving_completions.py and scheduler_output_processor_mixin.py introduce warnings that trigger when logprobs are empty but the generated text is not, which should be very helpful for debugging.

I've made a couple of suggestions to refactor small code duplications introduced with the new logging logic. These are minor points aimed at improving code clarity. Overall, the changes look good and are well-targeted for the debugging purpose.

gemini-code-assist · 2026-01-24T23:59:03Z

+
+                    # Debug logging for flaky streaming logprobs issue
+                    # See: https://github.com/sgl-project/sglang/actions/runs/21319310492/job/61366797740
+                    delta_for_log = text[len(stream_buffer) :]


The calculation text[len(stream_buffer) :] is also performed later at line 277 to define delta. To avoid this duplication, you could consider calculating this value once and reusing it. For example, you could calculate delta before the if request.logprobs is not None: block and use it in both places.

gemini-code-assist · 2026-01-24T23:59:03Z

+                        logprob_slice = req.output_token_logprobs_val[
+                            send_output_token_logprobs_offset:
+                        ]
+                        decode_ids_slice = decode_ids[req.send_decode_id_offset :]


The slice decode_ids[req.send_decode_id_offset :] is also calculated at line 947 within the same loop. To avoid this duplication, you could consider calculating it once after decode_ids is initialized and storing it in a variable for reuse in both places.

Kangyan-Zhou · 2026-01-25T01:26:20Z

/rerun-stage stage-b-test-small-1-gpu

Kangyan-Zhou · 2026-01-25T01:38:01Z

/rerun-stage stage-b-test-small-1-gpu

github-actions · 2026-01-25T01:38:22Z

✅ Triggered stage-b-test-small-1-gpu to run independently (skipping dependencies).

github-actions · 2026-01-25T01:38:28Z

🔗 View workflow run

The detokenizer holds back text at word boundaries during streaming to avoid showing incomplete words. On the final chunk, this buffered text is flushed. However, by then all logprobs have already been sent, causing the final chunk to have text but empty logprobs. Fix: Return None for logprobs when finish_reason is set and all logprobs have been sent. This is correct since no new tokens were generated - the text is just buffered text being flushed. Changes: - serving_completions.py: Return None for logprobs on final flush chunk - serving_chat.py: Apply same fix to chat completions streaming - test_openai_server.py: Update test to handle logprobs=None on final chunk - scheduler_output_processor_mixin.py: Remove debug logging Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Kangyan-Zhou · 2026-01-25T02:43:06Z

/rerun-stage stage-b-test-small-1-gpu

github-actions · 2026-01-25T02:43:25Z

✅ Triggered stage-b-test-small-1-gpu to run independently (skipping dependencies).

github-actions · 2026-01-25T02:43:31Z

🔗 View workflow run

Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

Kangyan-Zhou · 2026-01-25T20:53:19Z

/rerun-stage stage-b-test-small-1-gpu

github-actions · 2026-01-25T20:53:38Z

✅ Triggered stage-b-test-small-1-gpu to run independently (skipping dependencies).

github-actions · 2026-01-25T20:53:44Z

🔗 View workflow run

Kangyan-Zhou · 2026-01-25T21:12:51Z

/rerun-stage stage-b-test-small-1-gpu

github-actions · 2026-01-25T21:13:13Z

✅ Triggered stage-b-test-small-1-gpu to run independently (skipping dependencies).

github-actions · 2026-01-25T21:13:19Z

🔗 View workflow run

PR sgl-project#17687 fixed the case where empty logprobs were returned on the final chunk when finish_reason was set. However, the detokenizer can also flush buffered text mid-stream (when finish_reason is still None), causing the same issue. Changes: - serving_completions.py: Return logprobs=None when output_logprobs_slice is empty, regardless of finish_reason - serving_chat.py: Remove the "or finish_reason is None" condition that was causing empty logprobs to be processed mid-stream - utils.py: Remove debug logging added in previous commit - test_openai_server.py: Remove debug logging and update comments to clarify that logprobs=None can happen both mid-stream and on final chunk Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ing (sgl-project#17687) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

…ence When multiple streaming chunks queue up before the consumer drains them (streaming backlog), all chunks' meta_info["output_token_logprobs"] point to the same list object in tokenizer_manager.py. Later chunks extend the list, causing earlier chunks to see logprobs that belong to later chunks. This makes the first chunk "steal" all logprobs and leaves subsequent chunks with empty logprobs, triggering IndexError in the test. Root fix: record output_token_logprobs_length as an immutable int snapshot in meta_info at chunk creation time. Downstream consumers use this length to slice the shared list correctly, so each chunk sees only its own logprobs regardless of later mutations. This reverts the workaround from PR #17687 which only handled the finish_reason case but missed the mid-stream backlog scenario. Made-with: Cursor

Add more loggings for oai server output debug

f7d5e8f

Kangyan-Zhou requested review from CatherineSue, JustinTong0323, Ying1123, hnyls2002, ispobock, merrymercy, slin1237 and xiezhq-hermann as code owners January 24, 2026 23:56

gemini-code-assist Bot reviewed Jan 24, 2026

View reviewed changes

Kangyan-Zhou changed the title ~~Add more loggings for oai server output debug~~ Fix flaky streaming logprobs test by handling detokenizer text buffering Jan 25, 2026

JustinTong0323 reviewed Jan 25, 2026

View reviewed changes

Comment thread python/sglang/srt/entrypoints/openai/serving_completions.py

Update python/sglang/srt/entrypoints/openai/serving_completions.py

51f2471

Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

Merge branch 'main' into fix_flaky_test

91c0d3c

Kangyan-Zhou merged commit 592603d into sgl-project:main Jan 25, 2026
66 of 73 checks passed

Kangyan-Zhou mentioned this pull request Jan 26, 2026

Fix streaming logprobs returning empty tokens mid-stream #17736

Open

Chen-0210 pushed a commit to Chen-0210/sglang that referenced this pull request Jan 30, 2026

Fix flaky streaming logprobs test by handling detokenizer text buffer…

e331cd5

…ing (sgl-project#17687) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

Fix flaky streaming logprobs test by handling detokenizer text buffer…

23351ab

…ing (sgl-project#17687) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

Kangyan-Zhou mentioned this pull request Mar 1, 2026

Fix streaming completions sending empty logprobs instead of None #19611

Open

2 tasks

merrymercy mentioned this pull request Mar 20, 2026

Fix streaming logprobs corruption caused by shared mutable list reference #21030

Merged

3 tasks

Conversation

Kangyan-Zhou commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

gemini-code-assist Bot commented Jan 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Kangyan-Zhou commented Jan 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Kangyan-Zhou commented Jan 25, 2026

Uh oh!

Kangyan-Zhou commented Jan 25, 2026

Uh oh!

github-actions Bot commented Jan 25, 2026

Uh oh!

github-actions Bot commented Jan 25, 2026

Uh oh!

Kangyan-Zhou commented Jan 25, 2026

Uh oh!

github-actions Bot commented Jan 25, 2026

Uh oh!

github-actions Bot commented Jan 25, 2026

Uh oh!

Uh oh!

Kangyan-Zhou commented Jan 25, 2026

Uh oh!

github-actions Bot commented Jan 25, 2026

Uh oh!

github-actions Bot commented Jan 25, 2026

Uh oh!

Kangyan-Zhou commented Jan 25, 2026

Uh oh!

github-actions Bot commented Jan 25, 2026

Uh oh!

github-actions Bot commented Jan 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kangyan-Zhou commented Jan 24, 2026 •

edited

Loading