Add `--stream-response-default-include-usage` server flag by syd520zy · Pull Request #16711 · sgl-project/sglang

syd520zy · 2026-01-08T07:19:07Z

Motivation

When streaming is enabled, usage info is only returned if the client sets stream_options.include_usage = true. Server operators who need token-level monitoring metrics cannot rely on clients to set this. This PR adds a server-side flag to force usage inclusion in streaming responses.

Modifications

Add --stream-response-default-include-usage server arg that forces a final usage chunk in streaming responses even when stream_options is not specified by the client
Extract repeated stream_options checks into a shared should_include_usage() utility in utils.py
Remove dead enable_force_include_usage param from OpenAIServingResponses and unused stream_output field from ServerArgs
Fix test mock to include the new server arg attribute

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Follow the SGLang code style guidance.

…equest

gemini-code-assist · 2026-01-08T07:19:11Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…eques -- lint fix

Update issue reference from #16711 to #16714 which is the correct tracking issue for the Triton causal_conv1d_update bug with padded batches.

syd520zy · 2026-01-08T08:48:52Z

@JustinTong0323 @ispobock @slin1237 @CatherineSue @merrymercy Please review my pr, thank you very much

syd520zy · 2026-01-13T01:37:21Z

@CatherineSue @JustinTong0323 @ispobock @merrymercy @slin1237 Please review my pr, thank you very much

syd520zy · 2026-01-20T03:35:22Z

@CatherineSue @JustinTong0323 @ispobock @merrymercy @slin1237 Please review my pr, thank you very much

syd520zy · 2026-01-30T03:10:05Z

@slin1237 @merrymercy @ispobock @JustinTong0323 @CatherineSue Please review my pr, thank you very much

hnyls2002

Why is force?

syd520zy · 2026-03-05T02:25:23Z

Please review my pr, thank you very much @CatherineSue @merrymercy @slin1237 @ispobock @JustinTong0323

syd520zy · 2026-03-10T07:52:03Z

Why is force?

In the current framework, whether to output usage information depends on whether the user actively passes in this parameter. When the user does not pass in, the server will not be able to count the actual usage information of this request. After adding this parameter, I can control on the server whether all requests are forced to output usage information. In order to monitor and statistically analyze the token indicators of the business.

syd520zy · 2026-03-23T02:48:26Z

@syd520zy That does mean "force"; what you actually want to do is just set a default value for this. Please rewrite the confusing "force" logic.

I cannot directly set a default value because in the current implementation, the behavior of "whether to return usage information" is controlled by the user side. They can set "including usage" in the request body to make all responses carry usage related information. What I hope for now is that this force behavior can only be used when the server requires all requests to return usage information for statistical purposes due to "management" or "auditing" needs. Therefore, to ensure compatibility, the best approach here is to add a service startup parameter or configure an environment variable to control whether the server enables this force behavior

…usage

hnyls2002 · 2026-04-04T01:03:12Z

/tag-and-rerun-ci

hnyls2002 · 2026-04-04T03:57:18Z

/rerun-test registered/openai_server/basic/test_serving_chat.py

github-actions · 2026-04-04T03:57:44Z

✅ 1-gpu-5090: View workflow run

cd test/ && python3 registered/openai_server/basic/test_serving_chat.py

hnyls2002 · 2026-04-04T04:32:38Z

/rerun-test registered/openai_server/basic/test_serving_completions.py

github-actions · 2026-04-04T04:33:03Z

✅ 1-gpu-5090: View workflow run

cd test/ && python3 registered/openai_server/basic/test_serving_completions.py

…t#16711)

…erver flag (sgl-project#16711) Upstream SHA: de98590 Cherry-picked from sgl-project/sglang Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…t#16711)

syd520zy and others added 3 commits January 8, 2026 12:13

add "--enable-force-include-usage" argument, include usage on every r…

40dc504

…equest

Merge branch 'sgl-project:main' into main

934565a

add "--enable-force-include-usage" argument, include usage on every r…

0ceab6c

…equest

syd520zy requested review from CatherineSue, JustinTong0323, ispobock, merrymercy and slin1237 as code owners January 8, 2026 07:19

Merge branch 'main' into main

e97cdb4

alisonshao mentioned this pull request Jan 8, 2026

Triton kernel bug in causal_conv1d_update with padded batches #16712

Closed

syd520zy added 2 commits January 8, 2026 15:58

add "--enable-force-include-usage" argument, include usage on every r…

b753cf2

…eques -- lint fix

add "--enable-force-include-usage" argument, include usage on every r…

d8322d9

…eques -- lint fix

alisonshao added a commit that referenced this pull request Jan 8, 2026

Fix issue reference in skip comment

07cb7a8

Update issue reference from #16711 to #16714 which is the correct tracking issue for the Triton causal_conv1d_update bug with padded batches.

Merge branch 'main' into main

460cbcb

Merge branch 'sgl-project:main' into main

56664e3

syd520zy added 5 commits February 13, 2026 11:49

Merge branch 'main' into main

18d70ec

Add new lines before process_routed_experts_from_ret function

f19e3c9

Fix formatting of import statement for clarity

b042e54

Merge branch 'main' into main

5614e84

Merge branch 'main' into main

64b8d89

hnyls2002 requested changes Mar 5, 2026

View reviewed changes

syd520zy requested a review from hnyls2002 March 10, 2026 07:42

Merge branch 'main' into main

f624066

syd520zy added 6 commits March 16, 2026 09:19

Merge branch 'main' into main

049ed9e

Update utils.py

9855631

Merge branch 'main' into main

841b114

Merge branch 'main' into main

0beb617

Merge branch 'sgl-project:main' into main

be0be14

Merge branch 'sgl-project:main' into main

97b20b1

syd520zy and others added 7 commits March 31, 2026 08:53

Merge branch 'sgl-project:main' into main

3184078

Merge branch 'sgl-project:main' into main

5c9a39d

Merge branch 'sgl-project:main' into main

4b488b5

Rename enable_force_include_usage to stream_response_default_include_…

fd3532c

…usage

Remove erroneous stream_output field from ServerArgs

13b7240

Remove dead stream_response_default_include_usage from serving_responses

790a15a

Pass continuous_usage_stats as parameter to _process_tool_call_stream

d015572

github-actions Bot added the run-ci label Apr 4, 2026

Fix mock missing stream_response_default_include_usage attribute

16ed52a

hnyls2002 changed the title ~~Add force-include-usage Support for stream~~ Add --stream-response-default-include-usage server flag Apr 4, 2026

hnyls2002 changed the title ~~Add --stream-response-default-include-usage server flag~~ Add --stream-response-default-include-usage server flag Apr 4, 2026

hnyls2002 approved these changes Apr 4, 2026

View reviewed changes

hnyls2002 merged commit de98590 into sgl-project:main Apr 4, 2026
100 of 192 checks passed

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

Add --stream-response-default-include-usage server flag (sgl-projec…

1c55306

…t#16711)

Fridge003 pushed a commit that referenced this pull request Apr 7, 2026

Add --stream-response-default-include-usage server flag (#16711)

6746288

xiezhq-hermann pushed a commit to antgroup/sglang that referenced this pull request Apr 7, 2026

Add --stream-response-default-include-usage server flag (sgl-projec…

fe82b86

…t#16711)

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

Add --stream-response-default-include-usage server flag (sgl-projec…

0f0e2c1

…t#16711)

Conversation

syd520zy commented Jan 8, 2026 • edited by hnyls2002 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist Bot commented Jan 8, 2026

Uh oh!

syd520zy commented Jan 8, 2026

Uh oh!

syd520zy commented Jan 13, 2026

Uh oh!

syd520zy commented Jan 20, 2026

Uh oh!

syd520zy commented Jan 30, 2026

Uh oh!

hnyls2002 left a comment

Choose a reason for hiding this comment

Uh oh!

syd520zy commented Mar 5, 2026

Uh oh!

syd520zy commented Mar 10, 2026

Uh oh!

syd520zy commented Mar 23, 2026

Uh oh!

hnyls2002 commented Apr 4, 2026

Uh oh!

hnyls2002 commented Apr 4, 2026

Uh oh!

github-actions Bot commented Apr 4, 2026

Uh oh!

hnyls2002 commented Apr 4, 2026

Uh oh!

github-actions Bot commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

syd520zy commented Jan 8, 2026 •

edited by hnyls2002

Loading