feat(gateway): expose reasoning/thinking blocks in /v1/chat/completions responses (#37044)#39006
Open
rodboev wants to merge 1 commit into
Open
feat(gateway): expose reasoning/thinking blocks in /v1/chat/completions responses (#37044)#39006rodboev wants to merge 1 commit into
rodboev wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a reasoning-capable model (DeepSeek V4, Claude with thinking enabled, GPT-5.5) returns a thinking/reasoning trace, the Hermes TUI and messaging platform adapters display it correctly, but the API server's
/v1/chat/completionsendpoint silently drops it. Downstream consumers like Open WebUI that connect through the gateway see only the final assistant message with no thinking section.The root cause is in two places. For non-streaming responses,
_handle_chat_completionsatgateway/platforms/api_server.py:1960constructs the responsemessagedict with onlyroleandcontent, ignoringresult["last_reasoning"]even though the agent populates it (conversation_loop.py:4732 extractslast_reasoningfrom the most recent assistant message'sreasoningfield). For streaming responses, the handler does not wiretool_progress_callback(the comment at line 1858 says "intentionally not wired"), soreasoning.availableevents from the agent never reach the SSE writer.This PR adds
reasoning_contentto both response paths, following the OpenAI-compat convention used by DeepSeek, Kimi, and other thinking-mode providers. Non-streaming responses includereasoning_contentas a sibling ofcontentin the message object. Streaming responses emitdelta.reasoning_contentchunks beforedelta.contentchunks, which is the format Open WebUI and other OpenAI-compat frontends already parse. The feature is gated ondisplay.show_reasoning(resolved per-platform viagateway.display_config.resolve_display_setting), so when disabled (the default), the response is identical to today's output. The gateway runner already uses this same setting to conditionally prepend reasoning blocks in messaging platform responses (run.py:9468-9475).Fixes #37044
Changes
gateway/platforms/api_server.py: in_handle_chat_completions(), resolveshow_reasoningdisplay setting; for non-streaming, addreasoning_contentto the response message dict when present and display is enabled; for streaming, wire atool_progress_callbackthat forwardsreasoning.availableevents as tagged tuples, and extend_write_sse_chat_completion._emitto emitdelta.reasoning_contentchunks for those tuples (+~40 lines)tests/gateway/test_api_server.py: add 5 tests toTestChatCompletionsEndpointcovering reasoning in non-streaming (present, absent, display-disabled) and streaming (present, display-disabled) (+~215 lines)Validation
show_reasoning: truemessagehascontentonly; reasoning droppedmessagehas bothcontentandreasoning_contentmessagehascontentonlymessagehascontentonly (noreasoning_contentkey, identical)show_reasoning: falsemessagehascontentonlymessagehascontentonly (identical)show_reasoning: truedelta.contentchunksdelta.reasoning_contentchunks emitted beforedelta.contentchunksshow_reasoning: falsedelta.contentchunksdelta.contentchunks (identical)/v1/runsendpoint (already forwards reasoning.available)Test plan
pytest tests/gateway/test_api_server.py -v --timeout=0— 160 passed, 1 skipped (POSIX file-permission test)reasoning_contentwhen reasoning present and display enabledreasoning_contentwhen reasoning is None/emptyreasoning_contentwhenshow_reasoningdisplay setting is Falsedelta.reasoning_contentchunks when reasoning events fire and display enabledNot in scope
Adding a dedicated
gateway.expose_reasoningconfig toggle (the issue's proposed fix). The per-platformdisplay.show_reasoningcascade already provides this control and is the established pattern. Adding reasoning to the/v1/responses(Responses API) endpoint, which has its own event format and already forwardsreasoning.availableevents. Changing how the TUI or messaging platforms display reasoning.