Skip to content

[Feature] Add Reasoning Tokens Usage#15562

Merged
hnyls2002 merged 58 commits intosgl-project:mainfrom
Muqi1029:reasoning_tokens
Apr 4, 2026
Merged

[Feature] Add Reasoning Tokens Usage#15562
hnyls2002 merged 58 commits intosgl-project:mainfrom
Muqi1029:reasoning_tokens

Conversation

@Muqi1029
Copy link
Copy Markdown
Contributor

@Muqi1029 Muqi1029 commented Dec 21, 2025

Motivation

SGLang currently returns token usage information, but the reasoning_tokens field is always 0, which makes it unusable as a statistical metric. This is problematic since reasoning_tokens is an important signal for analysis and monitoring.

You can see the following result using latest(main branch) SGLang:
Server Launching Script

python -m sglang.launch_server \
    --model-path Qwen/Qwen3-8B \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen \
    --port 8888 \
    --log-requests \
    --log-requests-level 3
curl -X POST http://127.0.0.1:8888/v1/chat/completions \
    -H "Authorization: Bear None" \
    -H "Content-Type: application/json" \
    -d '{
        "messages": [
            {"role": "user", "content": "Who are you?"}
        ]
    }' | jq 
image

After this PR:

{
  "id": "40d09b4d150345b88a75945b1b7bb059",
  "object": "chat.completion",
  "created": 1766290957,
  "model": "default",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm Qwen, a large language model developed by Alibaba Cloud. I'm designed to assist with a wide range of tasks, such as answering questions, creating content, writing code, solving problems, and engaging in conversations. I aim to be helpful, friendly, and knowledgeable, and I'm here to learn and grow through our interactions. How can I assist you today? 😊",
        "reasoning_content": "Okay, the user asked, \"Who are you?\" I need to provide a clear and concise answer. Let me start by stating my name, Qwen. I should mention that I'm a large language model developed by Alibaba Cloud. It's important to highlight my capabilities, like answering questions, creating content, and engaging in conversations. I should also note that I can assist with various tasks such as writing, coding, and problem-solving. But I need to keep it friendly and approachable. Maybe add something about being here to help and learn from interactions. Let me check if I need to mention any specific features or limitations. Oh, right, I should avoid giving false information and encourage the user to ask questions. Let me structure this in a natural, conversational way without any markdown. Keep it simple and welcoming.\n",
        "tool_calls": null
      },
      "logprobs": null,
      "finish_reason": "stop",
      "matched_stop": 151645
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 261,
    "completion_tokens": 249,
    "prompt_tokens_details": null,
    "reasoning_tokens": 168
  },
  "metadata": {
    "weight_version": "default"
  }
}

This also works in streaming situations:

curl -N -X POST http://127.0.0.1:8888/v1/chat/completions \
  -H "Authorization: Bearer None" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "messages": [
      {"role": "user", "content": "Who are you?"}
    ],
    "stream_options": {
    	"include_usage": true,
    	"continuous_usage_stats": true
    }
  }'
data: {"id":"b8c4b28ecc6f48d5bffeb073e209ab22","object":"chat.completion.chunk","created":1766291188,"model":"default","choices":[{"index":0,"delta":{"role":null,"content":"?","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":{"prompt_tokens":12,"total_tokens":191,"completion_tokens":179,"prompt_tokens_details":null,"reasoning_tokens":114}}

data: {"id":"b8c4b28ecc6f48d5bffeb073e209ab22","object":"chat.completion.chunk","created":1766291188,"model":"default","choices":[{"index":0,"delta":{"role":null,"content":" ","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":{"prompt_tokens":12,"total_tokens":192,"completion_tokens":180,"prompt_tokens_details":null,"reasoning_tokens":114}}

data: {"id":"b8c4b28ecc6f48d5bffeb073e209ab22","object":"chat.completion.chunk","created":1766291188,"model":"default","choices":[{"index":0,"delta":{"role":null,"content":"😊","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":{"prompt_tokens":12,"total_tokens":193,"completion_tokens":181,"prompt_tokens_details":null,"reasoning_tokens":114}}

data: {"id":"b8c4b28ecc6f48d5bffeb073e209ab22","object":"chat.completion.chunk","created":1766291188,"model":"default","choices":[{"index":0,"delta":{"role":null,"content":null,"reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":"stop","matched_stop":151645}],"usage":null}

data: {"id":"b8c4b28ecc6f48d5bffeb073e209ab22","object":"chat.completion.chunk","created":1766291188,"model":"default","choices":[],"usage":{"prompt_tokens":12,"total_tokens":194,"completion_tokens":182,"prompt_tokens_details":null,"reasoning_tokens":114}}

data: [DONE]

Modifications

Compute reasoning_tokens based on req.require_reasoning and next_token_id during both the extend and decode stages.

The logic is intentionally NOT placed in the server process because the server process may introduce complexity related to potential re-tokenization. I think implementing this logic in the output_processor is simpler, nearly tiny overhead.

Checklist

Signed-off-by: Muqi Li <muqi1029@gmail.com>
Signed-off-by: Muqi Li <muqi1029@gmail.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@JustinTong0323
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@JustinTong0323
Copy link
Copy Markdown
Collaborator

Note: in current code path, if the server is launched without reasoning parser, the require_reasoning is always False. Is this intuitive?

Comment thread python/sglang/srt/managers/schedule_batch.py Outdated
@JustinTong0323
Copy link
Copy Markdown
Collaborator

We may also need to update docs

@Muqi1029
Copy link
Copy Markdown
Contributor Author

Note: in current code path, if the server is launched without reasoning parser, the require_reasoning is always False. Is this intuitive?

Do you mean we should get rid of require_reasoning judgement? But think_end_id is also set only when --reasoning-parser is set.

We should have a flag to know whether reqs require reasoning and its think_end_id.

@JustinTong0323
Copy link
Copy Markdown
Collaborator

Do you mean we should get rid of require_reasoning judgement? But think_end_id is also set only when --reasoning-parser is set.

We should have a flag to know whether reqs require reasoning and its think_end_id.

I mean, in certain cases, the user might want to obtain the reasoning tokens without enabling the reasoning parser?

@mufeez-amjad
Copy link
Copy Markdown
Contributor

@Muqi1029 I mistakenly also put up a PR (#15875) for this, following from #15660. I noticed you don't have any tests in this PR, maybe you can cherry-pick my tests (or copy with coauthor attribution)? I'll close my PR after that.

@Muqi1029
Copy link
Copy Markdown
Contributor Author

@Muqi1029 I mistakenly also put up a PR (#15875) for this, following from #15660. I noticed you don't have any tests in this PR, maybe you can cherry-pick my tests (or copy with coauthor attribution)? I'll close my PR after that.

okay, I will cherry-pick your tests instantly. Thanks for your remind!

Muqi1029 and others added 2 commits December 28, 2025 15:48
Signed-off-by: Muqi Li <muqi1029@gmail.com>
Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com>
@JustinTong0323
Copy link
Copy Markdown
Collaborator

Find another duplicated PR: #14404

Copy link
Copy Markdown
Collaborator

@JustinTong0323 JustinTong0323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified, Thanks for the contribution~

@MLKoz2
Copy link
Copy Markdown

MLKoz2 commented Mar 20, 2026

Very useful change, I met this problem and I am strongly waiting for Merge 😊. Thank you 👍

@anencore94
Copy link
Copy Markdown

anencore94 commented Mar 24, 2026

Hi @Muqi1029, @JustinTong0323 thanks for this great PR — it addresses exactly what we need.

We're running GLM-5 (754B FP8) in production with --reasoning-parser glm45 on H200x8 nodes, and reasoning_tokens: 0 is a real pain point for us in usage tracking and cost monitoring.

I noticed this PR has merge conflicts with the current main branch. Since we need this feature urgently, I'd be happy to help resolve the conflicts and push this forward — either by collaborating on this PR or opening a new one based on your work (with full credit, of course).

Would you be open to that? Or if you're planning to rebase soon, I'm happy to wait as well. Just want to make sure this doesn't stay stalled — there's clear community demand (4 duplicate PRs + multiple issues).

Thanks again for the work here!

@Muqi1029
Copy link
Copy Markdown
Contributor Author

Muqi1029 commented Mar 25, 2026

@MLKoz2 @anencore94 Thanks for your highlight and kind words. Now I have resolved the conflicts and passed the corresponding CIs locally.

@Fridge003 @hnyls2002 Please take a look.

@hnyls2002 hnyls2002 merged commit 1ad6839 into sgl-project:main Apr 4, 2026
138 of 161 checks passed
sundar24295s pushed a commit to sundar24295s/sglang that referenced this pull request Apr 4, 2026
Signed-off-by: Muqi Li <muqi1029@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com>
Co-authored-by: cklxx <1293822641@qq.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
@Muqi1029 Muqi1029 deleted the reasoning_tokens branch April 5, 2026 03:39
JustinTong0323 added a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Signed-off-by: Muqi Li <muqi1029@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com>
Co-authored-by: cklxx <1293822641@qq.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Fridge003 pushed a commit that referenced this pull request Apr 7, 2026
Signed-off-by: Muqi Li <muqi1029@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com>
Co-authored-by: cklxx <1293822641@qq.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
xiezhq-hermann pushed a commit to antgroup/sglang that referenced this pull request Apr 7, 2026
Signed-off-by: Muqi Li <muqi1029@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com>
Co-authored-by: cklxx <1293822641@qq.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Signed-off-by: Muqi Li <muqi1029@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com>
Co-authored-by: cklxx <1293822641@qq.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
@hnyls2002 hnyls2002 mentioned this pull request Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants