Track reasoning tokens during decoding by cklxx · Pull Request #14404 · sgl-project/sglang

cklxx · 2025-12-04T07:46:46Z

Motivation

Modifications

Tracked reasoning-token spans through decoding and surfaced usage/outputs end-to-end.

Added reasoning-aware state to SimpleContext and harmonies to parse content, accumulate prompt/completion/reasoning token counts, and pass parsed reasoning/
normal text through OpenAI responses (python/sglang/srt/entrypoints/context.py, python/sglang/srt/entrypoints/openai/serving_responses.py).
Count reasoning tokens during decoding by marking Req state and bumping counters per token, propagating the counts through batch outputs, tokenizer/
grpc metadata, and response metrics (python/sglang/srt/managers/schedule_batch.py, python/sglang/srt/managers/scheduler.py, python/sglang/srt/managers/
scheduler_output_processor_mixin.py, python/sglang/srt/managers/multi_tokenizer_mixin.py, python/sglang/srt/grpc/grpc_request_manager.py, python/sglang/srt/managers/
detokenizer_manager.py, python/sglang/srt/managers/tokenizer_manager.py, python/sglang/srt/managers/io_struct.py).
Documented reasoning token accounting and added a unit test to verify _update_reasoning_tokens_for_req behavior around think markers and implicit reasoning spans
(docs/advanced_features/observability.md, test/srt/test_reasoning_token_count.py).

Accuracy Tests

python3 -m pre_commit run --all-files
python3 -m pytest test/srt/test_reasoning_token_count.py

        "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
        "input": [{"role":"user","content":"Think step by step: what is 17 + 28?"}],
        "stream": true
      }' 
event: response.created
data: {"response":{"id":"resp_24dff514d6524aaa82c3f7f3af277ab3","created_at":1764838041.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":{},"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","object":"response","output":[],"parallel_tool_calls":true,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"reasoning":{"effort":null,"generate_summary":null,"summary":null},"safety_identifier":null,"service_tier":null,"status":"in_progress","text":null,"top_logprobs":null,"truncation":"disabled","usage":null,"user":null,"store":true},"sequence_number":0,"type":"response.created"}

event: response.in_progress
data: {"response":{"id":"resp_24dff514d6524aaa82c3f7f3af277ab3","created_at":1764838041.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":{},"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","object":"response","output":[],"parallel_tool_calls":true,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"reasoning":{"effort":null,"generate_summary":null,"summary":null},"safety_identifier":null,"service_tier":null,"status":"in_progress","text":null,"top_logprobs":null,"truncation":"disabled","usage":null,"user":null,"store":true},"sequence_number":1,"type":"response.in_progress"}

event: response.completed
data: {"response":{"id":"resp_24dff514d6524aaa82c3f7f3af277ab3","created_at":1764838041.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":{},"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","object":"response","output":[{"id":"rs_45b49bbd8d734fc6b5abdb63e88e97c6","summary":[],"type":"reasoning","content":[{"text":"To add 17 and 28, first I'll start by adding the units digits. 7 plus 8 equals 15, so I'll write down the 5 and carry over the 1 to the tens place.\n\nNext, I'll add the tens digits along with the carried-over 1. 1 plus 2 plus 1 equals 4. \n\nFinally, I'll combine the results to get the total sum.\n","type":"reasoning_text"}],"encrypted_content":null,"status":null},{"id":"msg_3cec65ee58264b8c8911a5358e6a1516","content":[{"annotations":[],"text":"**Solution: Adding 17 and 28**\n\nWe will add the numbers **17** and **28** step by step.\n\n1. **Write the numbers vertically:**\n   ```\n     17\n   + 28\n   ----\n   ```\n\n2. **Add the units place:**\n   - **7 (units place of 17) + 8 (units place of 28) = 15**\n   - Write down **5** in the units place and carry over **1** to the tens place.\n   ```\n     1 7\n   + 2 8\n   ----\n     5\n   ```\n\n3. **Add the tens place along with the carried-over value:**\n   - **1 (tens place of 17) + 2 (tens place of 28) + 1 (carried over) = 4**\n   - Write down **4** in the tens place.\n   ```\n     1 7\n   + 2 8\n   ----\n     4 5\n   ```\n\n4. **Combine the results:**\n   - The sum of **17** and **28** is **45**.\n\n**Final Answer:**\n\\[\n\\boxed{45}\n\\]","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"}],"parallel_tool_calls":true,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"reasoning":{"effort":null,"generate_summary":null,"summary":null},"safety_identifier":null,"service_tier":null,"status":"completed","text":null,"top_logprobs":null,"truncation":"disabled","usage":{"input_tokens":20,"input_tokens_details":{"cached_tokens":0},"output_tokens":370,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":390},"user":null,"store":true},"sequence_number":2,"type":"response.completed"}

root@0f8f8f32cd40:/# curl -s http://localhost:30000/v1/responses       -H "Content-Type: application/json"       -d '{
        "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
        "input": [{"role":"user","content":"Think step by step: what is 17 + 28?"}],
        "stream": false
      }' | tee /tmp/r1.json
{"id":"resp_185ba70ae28348b5984914a45df4c306","object":"response","created_at":1764838054,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","output":[{"id":"rs_9934a57e176043b39aac81c57046ebac","summary":[],"type":"reasoning","content":[{"text":"I need to add the numbers 17 and 28 together.\n\nFirst, I'll start by adding the units place: 7 plus 8 equals 15. I'll write down the 5 and carry over the 1 to the tens place.\n\nNext, I'll add the tens place along with the carried-over 1: 1 plus 2 plus 1 equals 4.\n\nFinally, I'll combine the results from both steps to get the final answer.\n","type":"reasoning_text"}],"encrypted_content":null,"status":null},{"id":"msg_4c637593a77c4f8891c2e235b7878743","content":[{"annotations":[],"text":"**Solution: Adding 17 and 28**\n\nTo find the sum of 17 and 28, follow these steps:\n\n1. **Write down the numbers:**\n   \n   \\[\n   \\begin{array}{c@{}c@{}c}\n     & 1 & 7 \\\\\n   + & 2 & 8 \\\\\n   \\hline\n   \\end{array}\n   \\]\n\n2. **Add the units place:**\n   \n   \\[\n   7 + 8 = 15\n   \\]\n   \n   - Write down the 5 in the units place.\n   - Carry over the 1 to the tens place.\n\n3. **Add the tens place along with the carried-over 1:**\n   \n   \\[\n   1 + 2 + 1 = 4\n   \\]\n   \n   - Write down the 4 in the tens place.\n\n4. **Combine the results:**\n   \n   \\[\n   \\begin{array}{c@{}c@{}c}\n     & 4 & 5 \\\\\n   \\end{array}\n   \\]\n\n**Final Answer:**\n\n\\[\n\\boxed{45}\n\\]","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"}],"status":"completed","usage":{"prompt_tokens":20,"total_tokens":367,"completion_tokens":347,"prompt_tokens_details":{"cached_tokens":19},"reasoning_tokens":0},"parallel_tool_calls":true,"tool_choice":"auto","tools":[],"error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":null,"previous_response_id":null,"reasoning":{"effort":null,"summary":null},"store":true,"temperature":null,"text":null,"top_p":null,"truncation":"disabled","user":null,"metadata":{}}root@0f8f8f32cd40:/#

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-04T07:46:49Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

cklxx · 2025-12-04T09:03:44Z

@hnyls2002 Please provide some review comments.

cklxx · 2025-12-08T05:02:30Z

@hnyls2002 Please provide some review comments.

🙏

hnyls2002 · 2025-12-28T08:21:09Z

/tag-and-rerun-ci

JustinTong0323 · 2025-12-28T17:44:59Z

I find this PR doesn't update serving_chat.py, #15562 did that. May you discuss how to merge your PR and make the code path more clean? Thanks! Maybe you could communicate through slack ? We could make a group chat

cklxx · 2025-12-29T03:06:20Z

@JustinTong0323
Sounds good.

I didn’t touch serving_chat.py here since this PR focuses on /v1/responses.

For merging, I think it’s cleaner to keep reasoning-token logic in shared runtime code and let both endpoints just do thin mapping. Happy to align.

Slack works — feel free to add me to a group DM or share your handle. You can also @mention me there with my GitHub username and I should be searchable. @cklxx

hnyls2002 · 2025-12-30T03:49:28Z

Moved to #15562

Track reasoning tokens during decoding

004ec01

cklxx requested review from Ying1123, hnyls2002, merrymercy, xiezhq-hermann and zhyncs as code owners December 4, 2025 07:46

Add reasoning token accounting test and docs

d91fe69

github-actions Bot added the documentation Improvements or additions to documentation label Dec 4, 2025

Refactor reasoning tracking through context

8d15601

cklxx requested review from CatherineSue, JustinTong0323, ispobock and slin1237 as code owners December 4, 2025 08:20

cklxx added 2 commits December 4, 2025 16:32

Fallback count for reasoning tokens in responses

5c3f79d

Remove reasoning token fallback; rely on meta/context counts

448dfe8

hnyls2002 added 2 commits December 28, 2025 16:13

Merge branch 'main' into fix/num-reasoning-tokens

1bfd1c4

adjust pos

18fac79

github-actions Bot added the run-ci label Dec 28, 2025

hnyls2002 added 5 commits December 28, 2025 17:38

do not override triton

f482183

move

6dc2834

simplify

414414c

adjust

9c6522f

tiny

0281687

hnyls2002 approved these changes Dec 28, 2025

View reviewed changes

cklxx added 2 commits December 28, 2025 22:08

Merge branch 'main' into fix/num-reasoning-tokens

f2772f9

Merge branch 'main' into fix/num-reasoning-tokens

cd3692c

JustinTong0323 mentioned this pull request Dec 28, 2025

[Feature] Add Reasoning Tokens Usage #15562

Merged

6 tasks

hnyls2002 closed this Dec 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track reasoning tokens during decoding#14404

Track reasoning tokens during decoding#14404
cklxx wants to merge 14 commits intosgl-project:mainfrom
cklxx:fix/num-reasoning-tokens

cklxx commented Dec 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Dec 4, 2025

Uh oh!

cklxx commented Dec 4, 2025

Uh oh!

cklxx commented Dec 8, 2025

Uh oh!

hnyls2002 commented Dec 28, 2025

Uh oh!

JustinTong0323 commented Dec 28, 2025 •

edited

Loading

Uh oh!

cklxx commented Dec 29, 2025

Uh oh!

hnyls2002 commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cklxx commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 4, 2025

Uh oh!

cklxx commented Dec 4, 2025

Uh oh!

cklxx commented Dec 8, 2025

Uh oh!

hnyls2002 commented Dec 28, 2025

Uh oh!

JustinTong0323 commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cklxx commented Dec 29, 2025

Uh oh!

hnyls2002 commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cklxx commented Dec 4, 2025 •

edited

Loading

JustinTong0323 commented Dec 28, 2025 •

edited

Loading