Track reasoning tokens during decoding#14404
Track reasoning tokens during decoding#14404cklxx wants to merge 14 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
@hnyls2002 Please provide some review comments. |
|
@hnyls2002 Please provide some review comments. 🙏 |
|
/tag-and-rerun-ci |
|
I find this PR doesn't update serving_chat.py, #15562 did that. May you discuss how to merge your PR and make the code path more clean? Thanks! Maybe you could communicate through slack ? We could make a group chat |
|
@JustinTong0323 I didn’t touch For merging, I think it’s cleaner to keep reasoning-token logic in shared runtime code and let both endpoints just do thin mapping. Happy to align. Slack works — feel free to add me to a group DM or share your handle. You can also @mention me there with my GitHub username and I should be searchable. @cklxx |
|
Moved to #15562 |
Motivation
fix #13250
Modifications
Tracked reasoning-token spans through decoding and surfaced usage/outputs end-to-end.
normal text through OpenAI responses (python/sglang/srt/entrypoints/context.py, python/sglang/srt/entrypoints/openai/serving_responses.py).
grpc metadata, and response metrics (python/sglang/srt/managers/schedule_batch.py, python/sglang/srt/managers/scheduler.py, python/sglang/srt/managers/
scheduler_output_processor_mixin.py, python/sglang/srt/managers/multi_tokenizer_mixin.py, python/sglang/srt/grpc/grpc_request_manager.py, python/sglang/srt/managers/
detokenizer_manager.py, python/sglang/srt/managers/tokenizer_manager.py, python/sglang/srt/managers/io_struct.py).
(docs/advanced_features/observability.md, test/srt/test_reasoning_token_count.py).
Accuracy Tests
Benchmarking and Profiling
Checklist