feat(gateway): expose reasoning/thinking blocks in /v1/chat/completions responses (#37044) by rodboev · Pull Request #39006 · NousResearch/hermes-agent

rodboev · 2026-06-04T12:30:56Z

Summary

When a reasoning-capable model (DeepSeek V4, Claude with thinking enabled, GPT-5.5) returns a thinking/reasoning trace, the Hermes TUI and messaging platform adapters display it correctly, but the API server's /v1/chat/completions endpoint silently drops it. Downstream consumers like Open WebUI that connect through the gateway see only the final assistant message with no thinking section.

The root cause is in two places. For non-streaming responses, _handle_chat_completions at gateway/platforms/api_server.py:1960 constructs the response message dict with only role and content, ignoring result["last_reasoning"] even though the agent populates it (conversation_loop.py:4732 extracts last_reasoning from the most recent assistant message's reasoning field). For streaming responses, the handler does not wire tool_progress_callback (the comment at line 1858 says "intentionally not wired"), so reasoning.available events from the agent never reach the SSE writer.

This PR adds reasoning_content to both response paths, following the OpenAI-compat convention used by DeepSeek, Kimi, and other thinking-mode providers. Non-streaming responses include reasoning_content as a sibling of content in the message object. Streaming responses emit delta.reasoning_content chunks before delta.content chunks, which is the format Open WebUI and other OpenAI-compat frontends already parse. The feature is gated on display.show_reasoning (resolved per-platform via gateway.display_config.resolve_display_setting), so when disabled (the default), the response is identical to today's output. The gateway runner already uses this same setting to conditionally prepend reasoning blocks in messaging platform responses (run.py:9468-9475).

Fixes #37044

Changes

gateway/platforms/api_server.py: in _handle_chat_completions(), resolve show_reasoning display setting; for non-streaming, add reasoning_content to the response message dict when present and display is enabled; for streaming, wire a tool_progress_callback that forwards reasoning.available events as tagged tuples, and extend _write_sse_chat_completion._emit to emit delta.reasoning_content chunks for those tuples (+~40 lines)
tests/gateway/test_api_server.py: add 5 tests to TestChatCompletionsEndpoint covering reasoning in non-streaming (present, absent, display-disabled) and streaming (present, display-disabled) (+~215 lines)

Validation

Scenario	Before	After
Non-streaming with reasoning model, `show_reasoning: true`	`message` has `content` only; reasoning dropped	`message` has both `content` and `reasoning_content`
Non-streaming with non-reasoning model	`message` has `content` only	`message` has `content` only (no `reasoning_content` key, identical)
Non-streaming with `show_reasoning: false`	`message` has `content` only	`message` has `content` only (identical)
Streaming with reasoning model, `show_reasoning: true`	only `delta.content` chunks	`delta.reasoning_content` chunks emitted before `delta.content` chunks
Streaming with `show_reasoning: false`	only `delta.content` chunks	only `delta.content` chunks (identical)
`/v1/runs` endpoint (already forwards reasoning.available)	reasoning events forwarded	unchanged (regression guard)
Messaging platforms (Telegram/Discord/Slack)	reasoning prepended as markdown block	unchanged (uses gateway/run.py path, not api_server)

Test plan

pytest tests/gateway/test_api_server.py -v --timeout=0 — 160 passed, 1 skipped (POSIX file-permission test)
New: non-streaming response includes reasoning_content when reasoning present and display enabled
New: non-streaming response omits reasoning_content when reasoning is None/empty
New: non-streaming response omits reasoning_content when show_reasoning display setting is False
New: streaming response includes delta.reasoning_content chunks when reasoning events fire and display enabled
New: streaming response omits reasoning chunks when display disabled

Not in scope

Adding a dedicated gateway.expose_reasoning config toggle (the issue's proposed fix). The per-platform display.show_reasoning cascade already provides this control and is the established pattern. Adding reasoning to the /v1/responses (Responses API) endpoint, which has its own event format and already forwards reasoning.available events. Changing how the TUI or messaging platforms display reasoning.

…ns responses (NousResearch#37044)

feat(gateway): expose reasoning/thinking blocks in /v1/chat/completio…

07f01b9

…ns responses (NousResearch#37044)

alt-glitch added type/feature New feature or request comp/gateway Gateway runner, session dispatch, delivery P3 Low — cosmetic, nice to have labels Jun 4, 2026

rodboev mentioned this pull request Jun 14, 2026

fix(gateway): surface reasoning previews in WebUI (#4146) nesquena/hermes-webui#4148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): expose reasoning/thinking blocks in /v1/chat/completions responses (#37044)#39006

feat(gateway): expose reasoning/thinking blocks in /v1/chat/completions responses (#37044)#39006
rodboev wants to merge 1 commit into
NousResearch:mainfrom
rodboev:pr/gateway-reasoning-in-oai-compat

rodboev commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rodboev commented Jun 4, 2026

Summary

Changes

Validation

Test plan

Not in scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants