[Bug]: Expose model reasoning/thinking blocks in /v1/chat/completions (fix #37044)#37067
Closed
alaamohanad169-ship-it wants to merge 1 commit into
Closed
[Bug]: Expose model reasoning/thinking blocks in /v1/chat/completions (fix #37044)#37067alaamohanad169-ship-it wants to merge 1 commit into
alaamohanad169-ship-it wants to merge 1 commit into
Conversation
… /v1/chat/completions (NousResearch#37044) The OpenAI-compatible API server adapter was silently dropping `last_reasoning` from the agent result, so downstream UIs (Open WebUI, LobeChat, etc.) connected to the Hermes gateway never saw the model's chain-of-thought even when the model produced a visible reasoning block in the TUI / CLI. This fix adds an opt-in `X-Hermes-Expose-Reasoning` request header. When set to a truthy value, the gateway surfaces the reasoning/thinking content in BOTH paths: * non-streaming: `message.reasoning_content` and `message.reasoning` (Open WebUI consumes `reasoning_content`, OpenAI-native clients consume `reasoning`) * streaming: `delta.reasoning_content` and `delta.reasoning` chunks on the chat.completion.chunk SSE stream Default behaviour is unchanged: no reasoning fields are emitted unless the client opts in, preserving wire-format compatibility with strict OpenAI parsers. Also adds: - a 256 KB defensive cap on per-chunk reasoning payload to bound memory + bandwidth against a malicious or buggy provider - 6 regression tests under TestChatCompletionsEndpoint covering the default-omit, opt-in, explicit-opt-out, streaming reasoning chunks, and size-cap cases
Contributor
Author
|
👋 Friendly nudge — this PR exposes model reasoning/thinking blocks in |
Contributor
Author
|
@OutThisLife — exposes model reasoning/thinking blocks in /v1/chat/completions. CI green, mergeable. Would appreciate a review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #37044.
The OpenAI-compatible API server adapter was silently dropping the
last_reasoningfield from the agent result, so downstream UIs (Open WebUI, LobeChat, etc.) connected to the Hermes gateway never saw the model's chain-of-thought even when the model produced a visible reasoning block in the TUI / CLI.This change adds an opt-in
X-Hermes-Expose-Reasoningrequest header. When set to a truthy value, the gateway surfaces the reasoning/thinking content in BOTH paths:/v1/chat/completions:message.reasoning_contentandmessage.reasoningare added to the assistant message/v1/chat/completionsSSE:delta.reasoning_contentanddelta.reasoningchunks on thechat.completion.chunkstreamreasoning_contentis the de facto standard used by Open WebUI / DeepSeek / OpenRouter / Nous Portal;reasoningis the OpenAI-native field name. Both are emitted for maximum client compatibility.Wire-format compatibility
Default behaviour is unchanged — no reasoning fields are emitted unless the client opts in via the header, so strict OpenAI parsers that don't know about the extension won't break.
Changes
gateway/platforms/api_server.py:_parse_expose_reasoning_headerhelper (reuses_coerce_request_boolsotrue|false|1|0|yes|no|on|offall work)reasoning_content+reasoningto the assistant messagetool_progress_callbackthat capturesreasoning.availableevents from the agent and emits them asdelta.reasoning_contentchunks on the SSE streamMAX_REASONING_CHUNK_BYTES(256 KB) defensive cap on per-chunk reasoning payload to prevent memory/bandwidth abusetests/gateway/test_api_server.py: 6 new tests underTestChatCompletionsEndpoint:test_non_streaming_omits_reasoning_by_defaulttest_non_streaming_exposes_reasoning_when_header_settest_non_streaming_expose_reasoning_header_false_omitstest_streaming_omits_reasoning_chunks_by_defaulttest_streaming_exposes_reasoning_chunks_when_header_settest_streaming_reasoning_chunk_capped_at_max_sizeTest plan
tests/gateway/test_api_server.pysuite: 162/162 passtests/run_agent/test_run_agent.py,test_partial_stream_finish_reason.py,test_streaming.py: 401/401 passRisk & scope
gateway/platforms/api_server.py+ matching testsgateway.expose_reasoningconfig flag (per-client granularity without a global config change)🤖 Generated with [Claude Code]