Skip to content

feat(gateway): expose reasoning/thinking blocks in /v1/chat/completions responses (#37044)#39006

Open
rodboev wants to merge 1 commit into
NousResearch:mainfrom
rodboev:pr/gateway-reasoning-in-oai-compat
Open

feat(gateway): expose reasoning/thinking blocks in /v1/chat/completions responses (#37044)#39006
rodboev wants to merge 1 commit into
NousResearch:mainfrom
rodboev:pr/gateway-reasoning-in-oai-compat

Conversation

@rodboev

@rodboev rodboev commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

When a reasoning-capable model (DeepSeek V4, Claude with thinking enabled, GPT-5.5) returns a thinking/reasoning trace, the Hermes TUI and messaging platform adapters display it correctly, but the API server's /v1/chat/completions endpoint silently drops it. Downstream consumers like Open WebUI that connect through the gateway see only the final assistant message with no thinking section.

The root cause is in two places. For non-streaming responses, _handle_chat_completions at gateway/platforms/api_server.py:1960 constructs the response message dict with only role and content, ignoring result["last_reasoning"] even though the agent populates it (conversation_loop.py:4732 extracts last_reasoning from the most recent assistant message's reasoning field). For streaming responses, the handler does not wire tool_progress_callback (the comment at line 1858 says "intentionally not wired"), so reasoning.available events from the agent never reach the SSE writer.

This PR adds reasoning_content to both response paths, following the OpenAI-compat convention used by DeepSeek, Kimi, and other thinking-mode providers. Non-streaming responses include reasoning_content as a sibling of content in the message object. Streaming responses emit delta.reasoning_content chunks before delta.content chunks, which is the format Open WebUI and other OpenAI-compat frontends already parse. The feature is gated on display.show_reasoning (resolved per-platform via gateway.display_config.resolve_display_setting), so when disabled (the default), the response is identical to today's output. The gateway runner already uses this same setting to conditionally prepend reasoning blocks in messaging platform responses (run.py:9468-9475).

Fixes #37044

Changes

  • gateway/platforms/api_server.py: in _handle_chat_completions(), resolve show_reasoning display setting; for non-streaming, add reasoning_content to the response message dict when present and display is enabled; for streaming, wire a tool_progress_callback that forwards reasoning.available events as tagged tuples, and extend _write_sse_chat_completion._emit to emit delta.reasoning_content chunks for those tuples (+~40 lines)
  • tests/gateway/test_api_server.py: add 5 tests to TestChatCompletionsEndpoint covering reasoning in non-streaming (present, absent, display-disabled) and streaming (present, display-disabled) (+~215 lines)

Validation

Scenario Before After
Non-streaming with reasoning model, show_reasoning: true message has content only; reasoning dropped message has both content and reasoning_content
Non-streaming with non-reasoning model message has content only message has content only (no reasoning_content key, identical)
Non-streaming with show_reasoning: false message has content only message has content only (identical)
Streaming with reasoning model, show_reasoning: true only delta.content chunks delta.reasoning_content chunks emitted before delta.content chunks
Streaming with show_reasoning: false only delta.content chunks only delta.content chunks (identical)
/v1/runs endpoint (already forwards reasoning.available) reasoning events forwarded unchanged (regression guard)
Messaging platforms (Telegram/Discord/Slack) reasoning prepended as markdown block unchanged (uses gateway/run.py path, not api_server)

Test plan

  • pytest tests/gateway/test_api_server.py -v --timeout=0 — 160 passed, 1 skipped (POSIX file-permission test)
  • New: non-streaming response includes reasoning_content when reasoning present and display enabled
  • New: non-streaming response omits reasoning_content when reasoning is None/empty
  • New: non-streaming response omits reasoning_content when show_reasoning display setting is False
  • New: streaming response includes delta.reasoning_content chunks when reasoning events fire and display enabled
  • New: streaming response omits reasoning chunks when display disabled

Not in scope

Adding a dedicated gateway.expose_reasoning config toggle (the issue's proposed fix). The per-platform display.show_reasoning cascade already provides this control and is the established pattern. Adding reasoning to the /v1/responses (Responses API) endpoint, which has its own event format and already forwards reasoning.available events. Changing how the TUI or messaging platforms display reasoning.

@alt-glitch alt-glitch added type/feature New feature or request comp/gateway Gateway runner, session dispatch, delivery P3 Low — cosmetic, nice to have labels Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: API server gateway does not expose model reasoning/thinking blocks in /v1/chat/completions responses

2 participants