[Bug]: Expose model reasoning/thinking blocks in /v1/chat/completions (fix #37044) by alaamohanad169-ship-it · Pull Request #37067 · NousResearch/hermes-agent

alaamohanad169-ship-it · 2026-06-01T23:42:56Z

Summary

The OpenAI-compatible API server adapter was silently dropping the last_reasoning field from the agent result, so downstream UIs (Open WebUI, LobeChat, etc.) connected to the Hermes gateway never saw the model's chain-of-thought even when the model produced a visible reasoning block in the TUI / CLI.

This change adds an opt-in X-Hermes-Expose-Reasoning request header. When set to a truthy value, the gateway surfaces the reasoning/thinking content in BOTH paths:

Non-streaming /v1/chat/completions: message.reasoning_content and message.reasoning are added to the assistant message
Streaming /v1/chat/completions SSE: delta.reasoning_content and delta.reasoning chunks on the chat.completion.chunk stream

reasoning_content is the de facto standard used by Open WebUI / DeepSeek / OpenRouter / Nous Portal; reasoning is the OpenAI-native field name. Both are emitted for maximum client compatibility.

Wire-format compatibility

Default behaviour is unchanged — no reasoning fields are emitted unless the client opts in via the header, so strict OpenAI parsers that don't know about the extension won't break.

POST /v1/chat/completions
X-Hermes-Expose-Reasoning: true

Changes

gateway/platforms/api_server.py:
- Added _parse_expose_reasoning_header helper (reuses _coerce_request_bool so true|false|1|0|yes|no|on|off all work)
- Non-streaming: when opted in, add reasoning_content + reasoning to the assistant message
- Streaming: wired a tool_progress_callback that captures reasoning.available events from the agent and emits them as delta.reasoning_content chunks on the SSE stream
- Added MAX_REASONING_CHUNK_BYTES (256 KB) defensive cap on per-chunk reasoning payload to prevent memory/bandwidth abuse
tests/gateway/test_api_server.py: 6 new tests under TestChatCompletionsEndpoint:
- test_non_streaming_omits_reasoning_by_default
- test_non_streaming_exposes_reasoning_when_header_set
- test_non_streaming_expose_reasoning_header_false_omits
- test_streaming_omits_reasoning_chunks_by_default
- test_streaming_exposes_reasoning_chunks_when_header_set
- test_streaming_reasoning_chunk_capped_at_max_size

Test plan

Full tests/gateway/test_api_server.py suite: 162/162 pass
tests/run_agent/test_run_agent.py, test_partial_stream_finish_reason.py, test_streaming.py: 401/401 pass
Independent reviewer subagent: no security or logic defects (4 non-blocking suggestions, all addressed)
Static security scan: clean

Risk & scope

Low risk — additive change, default behavior preserved
No breaking changes — existing clients see identical responses
Scope is contained — touches only gateway/platforms/api_server.py + matching tests
No config schema change — opt-in via request header instead of a new gateway.expose_reasoning config flag (per-client granularity without a global config change)

🤖 Generated with [Claude Code]

… /v1/chat/completions (NousResearch#37044) The OpenAI-compatible API server adapter was silently dropping `last_reasoning` from the agent result, so downstream UIs (Open WebUI, LobeChat, etc.) connected to the Hermes gateway never saw the model's chain-of-thought even when the model produced a visible reasoning block in the TUI / CLI. This fix adds an opt-in `X-Hermes-Expose-Reasoning` request header. When set to a truthy value, the gateway surfaces the reasoning/thinking content in BOTH paths: * non-streaming: `message.reasoning_content` and `message.reasoning` (Open WebUI consumes `reasoning_content`, OpenAI-native clients consume `reasoning`) * streaming: `delta.reasoning_content` and `delta.reasoning` chunks on the chat.completion.chunk SSE stream Default behaviour is unchanged: no reasoning fields are emitted unless the client opts in, preserving wire-format compatibility with strict OpenAI parsers. Also adds: - a 256 KB defensive cap on per-chunk reasoning payload to bound memory + bandwidth against a malicious or buggy provider - 6 regression tests under TestChatCompletionsEndpoint covering the default-omit, opt-in, explicit-opt-out, streaming reasoning chunks, and size-cap cases

alaamohanad169-ship-it · 2026-06-03T01:07:21Z

👋 Friendly nudge — this PR exposes model reasoning/thinking blocks in /v1/chat/completions responses. ✅ CI green, mergeable. Would love a review when someone gets a chance.

alaamohanad169-ship-it · 2026-06-03T11:24:17Z

@OutThisLife — exposes model reasoning/thinking blocks in /v1/chat/completions. CI green, mergeable. Would appreciate a review.

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/gateway Gateway runner, session dispatch, delivery platform/webhook Webhook / API server labels Jun 2, 2026

alaamohanad169-ship-it marked this pull request as ready for review June 3, 2026 00:00

alaamohanad169-ship-it closed this Jun 6, 2026

alaamohanad169-ship-it deleted the auto-fix-37044 branch June 6, 2026 02:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Expose model reasoning/thinking blocks in /v1/chat/completions (fix #37044)#37067

[Bug]: Expose model reasoning/thinking blocks in /v1/chat/completions (fix #37044)#37067
alaamohanad169-ship-it wants to merge 1 commit into
NousResearch:mainfrom
alaamohanad169-ship-it:auto-fix-37044

alaamohanad169-ship-it commented Jun 1, 2026

Uh oh!

alaamohanad169-ship-it commented Jun 3, 2026

Uh oh!

alaamohanad169-ship-it commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alaamohanad169-ship-it commented Jun 1, 2026

Summary

Wire-format compatibility

Changes

Test plan

Risk & scope

Uh oh!

alaamohanad169-ship-it commented Jun 3, 2026

Uh oh!

alaamohanad169-ship-it commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants