[Feature] Add Reasoning Tokens Usage by Muqi1029 · Pull Request #15562 · sgl-project/sglang

Muqi1029 · 2025-12-21T04:32:18Z

Motivation

SGLang currently returns token usage information, but the reasoning_tokens field is always 0, which makes it unusable as a statistical metric. This is problematic since reasoning_tokens is an important signal for analysis and monitoring.

You can see the following result using latest(main branch) SGLang:
Server Launching Script

python -m sglang.launch_server \
    --model-path Qwen/Qwen3-8B \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen \
    --port 8888 \
    --log-requests \
    --log-requests-level 3

curl -X POST http://127.0.0.1:8888/v1/chat/completions \
    -H "Authorization: Bear None" \
    -H "Content-Type: application/json" \
    -d '{
        "messages": [
            {"role": "user", "content": "Who are you?"}
        ]
    }' | jq

After this PR:

{
  "id": "40d09b4d150345b88a75945b1b7bb059",
  "object": "chat.completion",
  "created": 1766290957,
  "model": "default",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm Qwen, a large language model developed by Alibaba Cloud. I'm designed to assist with a wide range of tasks, such as answering questions, creating content, writing code, solving problems, and engaging in conversations. I aim to be helpful, friendly, and knowledgeable, and I'm here to learn and grow through our interactions. How can I assist you today? 😊",
        "reasoning_content": "Okay, the user asked, \"Who are you?\" I need to provide a clear and concise answer. Let me start by stating my name, Qwen. I should mention that I'm a large language model developed by Alibaba Cloud. It's important to highlight my capabilities, like answering questions, creating content, and engaging in conversations. I should also note that I can assist with various tasks such as writing, coding, and problem-solving. But I need to keep it friendly and approachable. Maybe add something about being here to help and learn from interactions. Let me check if I need to mention any specific features or limitations. Oh, right, I should avoid giving false information and encourage the user to ask questions. Let me structure this in a natural, conversational way without any markdown. Keep it simple and welcoming.\n",
        "tool_calls": null
      },
      "logprobs": null,
      "finish_reason": "stop",
      "matched_stop": 151645
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 261,
    "completion_tokens": 249,
    "prompt_tokens_details": null,
    "reasoning_tokens": 168
  },
  "metadata": {
    "weight_version": "default"
  }
}

This also works in streaming situations:

curl -N -X POST http://127.0.0.1:8888/v1/chat/completions \
  -H "Authorization: Bearer None" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "messages": [
      {"role": "user", "content": "Who are you?"}
    ],
    "stream_options": {
    	"include_usage": true,
    	"continuous_usage_stats": true
    }
  }'

data: {"id":"b8c4b28ecc6f48d5bffeb073e209ab22","object":"chat.completion.chunk","created":1766291188,"model":"default","choices":[{"index":0,"delta":{"role":null,"content":"?","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":{"prompt_tokens":12,"total_tokens":191,"completion_tokens":179,"prompt_tokens_details":null,"reasoning_tokens":114}}

data: {"id":"b8c4b28ecc6f48d5bffeb073e209ab22","object":"chat.completion.chunk","created":1766291188,"model":"default","choices":[{"index":0,"delta":{"role":null,"content":" ","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":{"prompt_tokens":12,"total_tokens":192,"completion_tokens":180,"prompt_tokens_details":null,"reasoning_tokens":114}}

data: {"id":"b8c4b28ecc6f48d5bffeb073e209ab22","object":"chat.completion.chunk","created":1766291188,"model":"default","choices":[{"index":0,"delta":{"role":null,"content":"😊","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":{"prompt_tokens":12,"total_tokens":193,"completion_tokens":181,"prompt_tokens_details":null,"reasoning_tokens":114}}

data: {"id":"b8c4b28ecc6f48d5bffeb073e209ab22","object":"chat.completion.chunk","created":1766291188,"model":"default","choices":[{"index":0,"delta":{"role":null,"content":null,"reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":"stop","matched_stop":151645}],"usage":null}

data: {"id":"b8c4b28ecc6f48d5bffeb073e209ab22","object":"chat.completion.chunk","created":1766291188,"model":"default","choices":[],"usage":{"prompt_tokens":12,"total_tokens":194,"completion_tokens":182,"prompt_tokens_details":null,"reasoning_tokens":114}}

data: [DONE]

Modifications

Compute reasoning_tokens based on req.require_reasoning and next_token_id during both the extend and decode stages.

The logic is intentionally NOT placed in the server process because the server process may introduce complexity related to potential re-tokenization. I think implementing this logic in the output_processor is simpler, nearly tiny overhead.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

Signed-off-by: Muqi Li <muqi1029@gmail.com>

gemini-code-assist · 2025-12-21T04:32:21Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

JustinTong0323 · 2025-12-21T14:32:21Z

/tag-and-rerun-ci

JustinTong0323 · 2025-12-21T15:03:53Z

Note: in current code path, if the server is launched without reasoning parser, the require_reasoning is always False. Is this intuitive?

JustinTong0323 · 2025-12-21T15:05:51Z

We may also need to update docs

Muqi1029 · 2025-12-21T15:56:52Z

Note: in current code path, if the server is launched without reasoning parser, the require_reasoning is always False. Is this intuitive?

Do you mean we should get rid of require_reasoning judgement? But think_end_id is also set only when --reasoning-parser is set.

We should have a flag to know whether reqs require reasoning and its think_end_id.

JustinTong0323 · 2025-12-21T17:36:56Z

Do you mean we should get rid of require_reasoning judgement? But think_end_id is also set only when --reasoning-parser is set.

We should have a flag to know whether reqs require reasoning and its think_end_id.

I mean, in certain cases, the user might want to obtain the reasoning tokens without enabling the reasoning parser?

mufeez-amjad · 2025-12-26T19:49:14Z

@Muqi1029 I mistakenly also put up a PR (#15875) for this, following from #15660. I noticed you don't have any tests in this PR, maybe you can cherry-pick my tests (or copy with coauthor attribution)? I'll close my PR after that.

Muqi1029 · 2025-12-28T07:44:41Z

@Muqi1029 I mistakenly also put up a PR (#15875) for this, following from #15660. I noticed you don't have any tests in this PR, maybe you can cherry-pick my tests (or copy with coauthor attribution)? I'll close my PR after that.

okay, I will cherry-pick your tests instantly. Thanks for your remind!

Signed-off-by: Muqi Li <muqi1029@gmail.com> Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com>

JustinTong0323 · 2025-12-28T17:26:03Z

Find another duplicated PR: #14404

JustinTong0323

Verified, Thanks for the contribution~

MLKoz2 · 2026-03-20T11:35:48Z

Very useful change, I met this problem and I am strongly waiting for Merge 😊. Thank you 👍

anencore94 · 2026-03-24T16:21:12Z

Hi @Muqi1029, @JustinTong0323 thanks for this great PR — it addresses exactly what we need.

We're running GLM-5 (754B FP8) in production with --reasoning-parser glm45 on H200x8 nodes, and reasoning_tokens: 0 is a real pain point for us in usage tracking and cost monitoring.

I noticed this PR has merge conflicts with the current main branch. Since we need this feature urgently, I'd be happy to help resolve the conflicts and push this forward — either by collaborating on this PR or opening a new one based on your work (with full credit, of course).

Would you be open to that? Or if you're planning to rebase soon, I'm happy to wait as well. Just want to make sure this doesn't stay stalled — there's clear community demand (4 duplicate PRs + multiple issues).

Thanks again for the work here!

…_tokens

Muqi1029 · 2026-03-25T02:43:11Z

@MLKoz2 @anencore94 Thanks for your highlight and kind words. Now I have resolved the conflicts and passed the corresponding CIs locally.

@Fridge003 @hnyls2002 Please take a look.

…erify_input

Signed-off-by: Muqi Li <muqi1029@gmail.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com> Co-authored-by: cklxx <1293822641@qq.com> Co-authored-by: hnyls2002 <lsyincs@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>

Muqi1029 added 3 commits December 21, 2025 11:43

add reasoning_tokens

39622e8

Signed-off-by: Muqi Li <muqi1029@gmail.com>

Merge branch 'main' into reasoning_tokens

eb957f0

update for streaming situation

e059cdb

Signed-off-by: Muqi Li <muqi1029@gmail.com>

Muqi1029 requested review from CatherineSue, JustinTong0323, Ying1123, hnyls2002, ispobock, merrymercy, slin1237 and xiezhq-hermann as code owners December 21, 2025 04:32

Merge branch 'main' into reasoning_tokens

8eafafd

github-actions Bot added the run-ci label Dec 21, 2025

JustinTong0323 reviewed Dec 21, 2025

View reviewed changes

Comment thread python/sglang/srt/managers/schedule_batch.py Outdated

Merge branch 'main' into reasoning_tokens

f5e0fd2

This was referenced Dec 26, 2025

fix(openai): include reasoning_tokens in streaming usage #15875

Closed

[Feature] support reasoning_tokens calculation in streaming chat interface #15660

Closed

Muqi1029 and others added 2 commits December 28, 2025 15:48

Merge branch 'main' into reasoning_tokens

48439ae

Cherry-pick test from sgl-project#15875

3e68a62

Signed-off-by: Muqi Li <muqi1029@gmail.com> Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com>

Muqi1029 force-pushed the reasoning_tokens branch from 31b0ebb to 3e68a62 Compare December 28, 2025 08:04

Tiny fix

bb91801

Muqi1029 and others added 2 commits March 4, 2026 10:20

Fix

4767dfa

Merge branch 'main' into reasoning_tokens

01d912f

JustinTong0323 approved these changes Mar 4, 2026

View reviewed changes

Muqi1029 added 2 commits March 25, 2026 10:16

Merge branch 'main' into reasoning_tokens

7be4714

Merge remote-tracking branch 'origin/reasoning_tokens' into reasoning…

4aa6f45

…_tokens

JustinTong0323 and others added 6 commits April 3, 2026 02:02

Merge branch 'main' into reasoning_tokens

e171f35

Keep ngram verify() return value unchanged, read accept_length from v…

8a7c8de

…erify_input

Merge branch 'main' into reasoning_tokens

a73397d

Simplify update_reasoning_tokens with index lookup

c3950cf

Cache think_end_id and extract _maybe_update_reasoning_tokens helper

ebffd0c

Add type hint for _maybe_update_reasoning_tokens

eddfedc

hnyls2002 force-pushed the reasoning_tokens branch from d4810bf to eddfedc Compare April 4, 2026 07:47

hnyls2002 approved these changes Apr 4, 2026

View reviewed changes

hnyls2002 and others added 3 commits April 4, 2026 00:56

Fix CI suite name for reasoning usage tokens test

976d07b

Remove AMD CI registration from reasoning usage tokens test

5e01651

Merge branch 'main' into reasoning_tokens

4295505

hnyls2002 merged commit 1ad6839 into sgl-project:main Apr 4, 2026
138 of 161 checks passed

This was referenced Apr 4, 2026

Support reasoning_tokens with openai style in serving_chat #17764

Closed

Add metrics to reasoning tokens #17156

Closed

Muqi1029 deleted the reasoning_tokens branch April 5, 2026 03:39

hnyls2002 mentioned this pull request Apr 5, 2026

Isolate spec V1 path in decode post-processing #22146

Merged

hnyls2002 mentioned this pull request Apr 29, 2026

Deepseek V4 #23882

Merged

Conversation

Muqi1029 commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 21, 2025

Uh oh!

JustinTong0323 commented Dec 21, 2025

Uh oh!

JustinTong0323 commented Dec 21, 2025

Uh oh!

Uh oh!

JustinTong0323 commented Dec 21, 2025

Uh oh!

Muqi1029 commented Dec 21, 2025

Uh oh!

JustinTong0323 commented Dec 21, 2025

Uh oh!

mufeez-amjad commented Dec 26, 2025

Uh oh!

Muqi1029 commented Dec 28, 2025

Uh oh!

JustinTong0323 commented Dec 28, 2025

Uh oh!

JustinTong0323 left a comment

Choose a reason for hiding this comment

Uh oh!

MLKoz2 commented Mar 20, 2026

Uh oh!

anencore94 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Muqi1029 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Muqi1029 commented Dec 21, 2025 •

edited

Loading

anencore94 commented Mar 24, 2026 •

edited

Loading

Muqi1029 commented Mar 25, 2026 •

edited

Loading