Skip to content

Track reasoning tokens during decoding#14404

Closed
cklxx wants to merge 14 commits intosgl-project:mainfrom
cklxx:fix/num-reasoning-tokens
Closed

Track reasoning tokens during decoding#14404
cklxx wants to merge 14 commits intosgl-project:mainfrom
cklxx:fix/num-reasoning-tokens

Conversation

@cklxx
Copy link
Copy Markdown
Contributor

@cklxx cklxx commented Dec 4, 2025

Motivation

fix #13250

Modifications

Tracked reasoning-token spans through decoding and surfaced usage/outputs end-to-end.

  • Added reasoning-aware state to SimpleContext and harmonies to parse content, accumulate prompt/completion/reasoning token counts, and pass parsed reasoning/
    normal text through OpenAI responses (python/sglang/srt/entrypoints/context.py, python/sglang/srt/entrypoints/openai/serving_responses.py).
  • Count reasoning tokens during decoding by marking Req state and bumping counters per token, propagating the counts through batch outputs, tokenizer/
    grpc metadata, and response metrics (python/sglang/srt/managers/schedule_batch.py, python/sglang/srt/managers/scheduler.py, python/sglang/srt/managers/
    scheduler_output_processor_mixin.py, python/sglang/srt/managers/multi_tokenizer_mixin.py, python/sglang/srt/grpc/grpc_request_manager.py, python/sglang/srt/managers/
    detokenizer_manager.py, python/sglang/srt/managers/tokenizer_manager.py, python/sglang/srt/managers/io_struct.py).
  • Documented reasoning token accounting and added a unit test to verify _update_reasoning_tokens_for_req behavior around think markers and implicit reasoning spans
    (docs/advanced_features/observability.md, test/srt/test_reasoning_token_count.py).

Accuracy Tests

  • python3 -m pre_commit run --all-files
  • python3 -m pytest test/srt/test_reasoning_token_count.py
        "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
        "input": [{"role":"user","content":"Think step by step: what is 17 + 28?"}],
        "stream": true
      }' 
event: response.created
data: {"response":{"id":"resp_24dff514d6524aaa82c3f7f3af277ab3","created_at":1764838041.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":{},"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","object":"response","output":[],"parallel_tool_calls":true,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"reasoning":{"effort":null,"generate_summary":null,"summary":null},"safety_identifier":null,"service_tier":null,"status":"in_progress","text":null,"top_logprobs":null,"truncation":"disabled","usage":null,"user":null,"store":true},"sequence_number":0,"type":"response.created"}

event: response.in_progress
data: {"response":{"id":"resp_24dff514d6524aaa82c3f7f3af277ab3","created_at":1764838041.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":{},"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","object":"response","output":[],"parallel_tool_calls":true,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"reasoning":{"effort":null,"generate_summary":null,"summary":null},"safety_identifier":null,"service_tier":null,"status":"in_progress","text":null,"top_logprobs":null,"truncation":"disabled","usage":null,"user":null,"store":true},"sequence_number":1,"type":"response.in_progress"}

event: response.completed
data: {"response":{"id":"resp_24dff514d6524aaa82c3f7f3af277ab3","created_at":1764838041.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":{},"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","object":"response","output":[{"id":"rs_45b49bbd8d734fc6b5abdb63e88e97c6","summary":[],"type":"reasoning","content":[{"text":"To add 17 and 28, first I'll start by adding the units digits. 7 plus 8 equals 15, so I'll write down the 5 and carry over the 1 to the tens place.\n\nNext, I'll add the tens digits along with the carried-over 1. 1 plus 2 plus 1 equals 4. \n\nFinally, I'll combine the results to get the total sum.\n","type":"reasoning_text"}],"encrypted_content":null,"status":null},{"id":"msg_3cec65ee58264b8c8911a5358e6a1516","content":[{"annotations":[],"text":"**Solution: Adding 17 and 28**\n\nWe will add the numbers **17** and **28** step by step.\n\n1. **Write the numbers vertically:**\n   ```\n     17\n   + 28\n   ----\n   ```\n\n2. **Add the units place:**\n   - **7 (units place of 17) + 8 (units place of 28) = 15**\n   - Write down **5** in the units place and carry over **1** to the tens place.\n   ```\n     1 7\n   + 2 8\n   ----\n     5\n   ```\n\n3. **Add the tens place along with the carried-over value:**\n   - **1 (tens place of 17) + 2 (tens place of 28) + 1 (carried over) = 4**\n   - Write down **4** in the tens place.\n   ```\n     1 7\n   + 2 8\n   ----\n     4 5\n   ```\n\n4. **Combine the results:**\n   - The sum of **17** and **28** is **45**.\n\n**Final Answer:**\n\\[\n\\boxed{45}\n\\]","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"}],"parallel_tool_calls":true,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"reasoning":{"effort":null,"generate_summary":null,"summary":null},"safety_identifier":null,"service_tier":null,"status":"completed","text":null,"top_logprobs":null,"truncation":"disabled","usage":{"input_tokens":20,"input_tokens_details":{"cached_tokens":0},"output_tokens":370,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":390},"user":null,"store":true},"sequence_number":2,"type":"response.completed"}

root@0f8f8f32cd40:/# curl -s http://localhost:30000/v1/responses       -H "Content-Type: application/json"       -d '{
        "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
        "input": [{"role":"user","content":"Think step by step: what is 17 + 28?"}],
        "stream": false
      }' | tee /tmp/r1.json
{"id":"resp_185ba70ae28348b5984914a45df4c306","object":"response","created_at":1764838054,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","output":[{"id":"rs_9934a57e176043b39aac81c57046ebac","summary":[],"type":"reasoning","content":[{"text":"I need to add the numbers 17 and 28 together.\n\nFirst, I'll start by adding the units place: 7 plus 8 equals 15. I'll write down the 5 and carry over the 1 to the tens place.\n\nNext, I'll add the tens place along with the carried-over 1: 1 plus 2 plus 1 equals 4.\n\nFinally, I'll combine the results from both steps to get the final answer.\n","type":"reasoning_text"}],"encrypted_content":null,"status":null},{"id":"msg_4c637593a77c4f8891c2e235b7878743","content":[{"annotations":[],"text":"**Solution: Adding 17 and 28**\n\nTo find the sum of 17 and 28, follow these steps:\n\n1. **Write down the numbers:**\n   \n   \\[\n   \\begin{array}{c@{}c@{}c}\n     & 1 & 7 \\\\\n   + & 2 & 8 \\\\\n   \\hline\n   \\end{array}\n   \\]\n\n2. **Add the units place:**\n   \n   \\[\n   7 + 8 = 15\n   \\]\n   \n   - Write down the 5 in the units place.\n   - Carry over the 1 to the tens place.\n\n3. **Add the tens place along with the carried-over 1:**\n   \n   \\[\n   1 + 2 + 1 = 4\n   \\]\n   \n   - Write down the 4 in the tens place.\n\n4. **Combine the results:**\n   \n   \\[\n   \\begin{array}{c@{}c@{}c}\n     & 4 & 5 \\\\\n   \\end{array}\n   \\]\n\n**Final Answer:**\n\n\\[\n\\boxed{45}\n\\]","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"}],"status":"completed","usage":{"prompt_tokens":20,"total_tokens":367,"completion_tokens":347,"prompt_tokens_details":{"cached_tokens":19},"reasoning_tokens":0},"parallel_tool_calls":true,"tool_choice":"auto","tools":[],"error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":null,"previous_response_id":null,"reasoning":{"effort":null,"summary":null},"store":true,"temperature":null,"text":null,"top_p":null,"truncation":"disabled","user":null,"metadata":{}}root@0f8f8f32cd40:/# 

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Dec 4, 2025
@cklxx
Copy link
Copy Markdown
Contributor Author

cklxx commented Dec 4, 2025

@hnyls2002 Please provide some review comments.

@cklxx
Copy link
Copy Markdown
Contributor Author

cklxx commented Dec 8, 2025

@hnyls2002 Please provide some review comments.

🙏

@hnyls2002
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@JustinTong0323
Copy link
Copy Markdown
Collaborator

JustinTong0323 commented Dec 28, 2025

I find this PR doesn't update serving_chat.py, #15562 did that. May you discuss how to merge your PR and make the code path more clean? Thanks! Maybe you could communicate through slack ? We could make a group chat

@cklxx
Copy link
Copy Markdown
Contributor Author

cklxx commented Dec 29, 2025

@JustinTong0323
Sounds good.

I didn’t touch serving_chat.py here since this PR focuses on /v1/responses.

For merging, I think it’s cleaner to keep reasoning-token logic in shared runtime code and let both endpoints just do thin mapping. Happy to align.

Slack works — feel free to add me to a group DM or share your handle. You can also @mention me there with my GitHub username and I should be searchable. @cklxx

@hnyls2002 hnyls2002 closed this Dec 30, 2025
@hnyls2002
Copy link
Copy Markdown
Collaborator

Moved to #15562

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize getting logic of num_reasoning_tokens in scheduler

3 participants