fix(plugins): surface reasoning_content in Langfuse assistant message (#29482) by briandevans · Pull Request #29486 · NousResearch/hermes-agent

briandevans · 2026-05-20T21:16:06Z

What does this PR do?

The bundled observability/langfuse plugin's _serialize_assistant_message reads only the top-level reasoning attribute, so providers that emit chain-of-thought via the reasoning_content convention (LM Studio, Moonshot, Qwen3 thinking, DeepSeek) silently lose their reasoning in Langfuse observations even though Hermes captures it correctly. Every LLM call N shows reasoning: None for these providers.

Root cause: NormalizedResponse (agent/transports/types.py) stores the reasoning text in provider_data["reasoning_content"] and exposes it via the reasoning_content property (line 115). The serializer never reads that property — only the top-level reasoning field. Fall back to reasoning_content when reasoning is empty/missing so the property is read on the production code path the issue reporter hits.

Mirrors the existing reasoning resolution shape used downstream — top-level reasoning (Anthropic / Codex) wins when set, reasoning_content (OpenAI-compatible reasoning_content convention) fills in for the providers that use it.

Audited siblings: the other two output-construction sites in the same hook (post_llm_call string path at line 828, post_api_request summary path at line 833) do not receive an assistant message object and hardcode reasoning: None. No widening needed in this PR.

Related Issue

Fixes #29482

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

plugins/observability/langfuse/__init__.py — _serialize_assistant_message falls back to getattr(message, "reasoning_content", None) when the top-level reasoning is empty. Comment explains the two reasoning conventions.
tests/plugins/test_langfuse_plugin.py — new TestSerializeAssistantMessageReasoning class with 5 cases: top-level precedence, fallback when reasoning is None, fallback when reasoning == "", end-to-end via real NormalizedResponse(provider_data={"reasoning_content": ...}), and both-missing returns None.

How to Test

uv run --with pytest --with pytest-xdist --with pytest-asyncio --with pytest-timeout python3 -m pytest tests/plugins/test_langfuse_plugin.py::TestSerializeAssistantMessageReasoning -v
Full file regression: uv run --with pytest --with pytest-xdist --with pytest-asyncio --with pytest-timeout python3 -m pytest tests/plugins/test_langfuse_plugin.py -v → 42 passed.
Regression guard: reverting only the production change makes the 3 fallback tests fail with AssertionError: assert '' == 'qwen3 thinking' (and similar); restoring it makes them pass — confirming the tests actually exercise the new behavior.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(plugins):)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix (no unrelated commits)
I've run focused tests for the touched code and all pass (42/42 in the langfuse file)
I've added tests for my changes
I've tested on my platform: macOS 15.x

Documentation & Housekeeping

I've updated relevant documentation — N/A (internal serializer)
I've updated cli-config.yaml.example if I added/changed config keys — N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
I've considered cross-platform impact (Windows, macOS) — N/A (pure-Python serializer, no platform paths)
I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

Before (current main):

# Langfuse "LLM call N" observation, LM Studio + Qwen3 thinking
output:
  content: "final answer …"
  reasoning: None     # ← lost
  tool_calls: [...]

After this PR:

output:
  content: "final answer …"
  reasoning: "the model's chain of thought …"   # ← surfaced
  tool_calls: [...]

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates Langfuse message serialization to surface assistant “reasoning” across providers that store it under reasoning_content, and adds regression tests to cover precedence/fallback behavior.

Changes:

Update _serialize_assistant_message to prefer message.reasoning, then fall back to message.reasoning_content.
Add tests validating precedence, fallback for None/empty string, and the NormalizedResponse.provider_data["reasoning_content"] code path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
tests/plugins/test_langfuse_plugin.py	Adds regression tests for reasoning precedence and `reasoning_content` fallback behavior.
plugins/observability/langfuse/init.py	Implements fallback to `reasoning_content` when `reasoning` is not set, so Langfuse observations include “thinking”.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    # Anthropic / Codex behaviour unchanged, then fall back so Langfuse's
+    # observation actually shows the thinking instead of `reasoning: None`.
+    reasoning = getattr(message, "reasoning", None)
+    if not reasoning:


    return {
        "content": _safe_value(getattr(message, "content", None)),
-        "reasoning": _safe_value(getattr(message, "reasoning", None)),
+        "reasoning": _safe_value(reasoning),
        "tool_calls": _serialize_tool_calls(getattr(message, "tool_calls", None)),
    }


swanhtet1992 · 2026-05-20T21:26:45Z

One case this doesn't catch: OpenRouter's unified format puts reasoning under reasoning_details (array of {type, summary|thinking|content|text} objects, exposed via the reasoning_details property at agent/transports/types.py:120). Both extract_content_or_reasoning in agent/auxiliary_client.py:4404 and _extract_reasoning in run_agent.py already cover this third source, so the plugin would be aligned with them by adding it.

Can send a follow-up PR for it once this merges, or fold it in here if easier.

briandevans · 2026-05-20T22:16:12Z

@swanhtet1992 Folded in via 90de53b03 — _serialize_assistant_message now mirrors agent.agent_runtime_helpers.extract_reasoning precedence exactly:

reasoning → reasoning_content → reasoning_details (per-detail key order summary > thinking > content > text).

Covered by four new tests:

test_falls_back_to_reasoning_details_openrouter_unified — full {type, summary|thinking|content|text} walk with deterministic join order
test_reasoning_details_skips_non_dict_and_empty — non-dict entries / whitespace-only summary / detail with no recognised key / duplicate-stripping
test_top_level_reasoning_wins_over_reasoning_details — precedence guard
(plus the existing five for reasoning / reasoning_content)

Thanks for the precise pointer to agent/transports/types.py:120 and the sibling extractors — kept the diff aligned with both so the Langfuse observation no longer diverges from what the main agent loop actually surfaces.

briandevans · 2026-05-20T23:11:17Z

CI audit — both test job failures are pre-existing baselines on clean origin/main (c6a380eb6), neither is in this PR's touched code:

Test	Symptom	Root cause on main
`tests/plugins/web/test_web_search_provider_plugins.py::TestBundledPluginsRegister::test_all_seven_plugins_present_in_registry`	`Left contains one more item: 'xai'`	Commit `a0c031299 feat(web): add xAI Web Search provider plugin` registered an 8th bundled web provider but the literal 7-plugin assertion (plus three parametrized 7-plugin lists in the same file) was never updated.
`tests/hermes_cli/test_update_hangup_protection.py::TestInstallHangupProtection::test_wraps_stdout_and_stderr_with_mirror`	`assert isinstance(sys.stdout, _UpdateOutputStream)` is False under xdist	Other tests in the same worker reload `hermes_cli.main`, leaving the module-imported class stale even though the live stream is the correct wrapper — `isinstance` fails across the reload boundary. Reproduces on clean `origin/main` under `pytest -n auto`; passes when the test runs alone.

Both are already addressed by @camaragon's #29224 (canonical fix covering both files). No action needed on this PR.

briandevans · 2026-05-24T16:09:20Z

@copilot Both review findings addressed in d953a4b18:

L482 (truthy guard): replaced if not reasoning: with a dedicated _reasoning_is_blank helper that only triggers fallback when reasoning is None or a blank/whitespace string. A provider that places a legitimately falsy non-string payload (0, False, {}, []) on reasoning is now preserved verbatim. New regression test test_falsy_non_string_reasoning_is_not_overwritten covers the empty-dict case.
L505 (CoT export gating): this PR doesn't change the export policy — the existing observation already serialised the top-level reasoning field unconditionally on every assistant message; the PR just mirrors that same policy for the reasoning_content and reasoning_details surfaces so OpenRouter/DeepSeek/LM Studio traffic is no longer silently dropped. A redaction/opt-out gate is a reasonable separate enhancement, but covering all three surfaces uniformly is outside the scope of this regression fix for [Bug]: Langfuse plugin shows reasoning: None for reasoning_content models (DeepSeek/Qwen/LM Studio convention) #29482.

CI test (1)/(6) failure is the unrelated tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_resize_escape_is_forwarded pty timeout — no overlap with touched files.

Providers that emit chain-of-thought via the `reasoning_content` convention (LM Studio, Moonshot, Qwen3 thinking, DeepSeek) leave the top-level `reasoning` field unset and stash the text on `NormalizedResponse.provider_data["reasoning_content"]`, surfaced via the property at agent/transports/types.py:115. `_serialize_assistant_message` only read `reasoning`, so every Langfuse `LLM call N` observation showed `reasoning: None` for these providers even though Hermes captured the thinking fine (visible elsewhere in the runtime via `last_reasoning` in run_agent.py). Fall back to `reasoning_content` when `reasoning` is empty/missing so the property is read on the production code path the issue reporter hits. The other two output-construction sites in the same hook (`post_llm_call` string path, `post_api_request` summary path) do not receive an assistant message object and hardcode `reasoning: None`; both are intentionally unchanged. Fixes NousResearch#29482

@swanhtet1992

…earch#29482) Extend `_serialize_assistant_message` with a third fallback that walks the `reasoning_details` array (OpenRouter's unified format, `[{type, summary|thinking|content|text}, ...]`). Without this branch, OpenRouter-routed traffic still recorded `reasoning: None` even though the structured payload was present on the NormalizedResponse. Mirrors `agent.agent_runtime_helpers.extract_reasoning` precedence exactly: `reasoning` > `reasoning_content` > `reasoning_details` (per-detail key order `summary > thinking > content > text`), so the Langfuse observation matches whatever the main agent loop and `extract_content_or_reasoning` already surface. Per @swanhtet1992's review on NousResearch#29486.

Replace the truthy if not reasoning: cascade with an explicit _reasoning_is_blank helper that only triggers on None or blank-whitespace strings. The previous guard would have overwritten a legitimately falsy non-string payload (0, False, {}, []) that a provider may have placed in reasoning deliberately. Addresses Copilot review on NousResearch#29486.

briandevans · 2026-06-01T20:10:18Z

Closing to focus the queue on security/file-safety work where civilian merges are landing. Happy to reopen if maintainers want this picked up.

Copilot AI review requested due to automatic review settings May 20, 2026 21:16

Copilot AI reviewed May 20, 2026

View reviewed changes

swanhtet1992 mentioned this pull request May 20, 2026

Extend reasoning serialization to OpenRouter reasoning_details (follow-up for #29482) briandevans/hermes-agent#1

Open

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/plugins Plugin system and bundled plugins labels May 20, 2026

briandevans force-pushed the fix/langfuse-reasoning-content-fallback-29482 branch from 90de53b to f80d83f Compare May 23, 2026 10:15

briandevans force-pushed the fix/langfuse-reasoning-content-fallback-29482 branch from f80d83f to 552cb40 Compare May 24, 2026 14:15

briandevans force-pushed the fix/langfuse-reasoning-content-fallback-29482 branch from d953a4b to 6a5a881 Compare May 27, 2026 02:15

briandevans added 3 commits May 28, 2026 21:10

briandevans force-pushed the fix/langfuse-reasoning-content-fallback-29482 branch from 6a5a881 to 26c34a5 Compare May 29, 2026 04:10

briandevans closed this Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(plugins): surface reasoning_content in Langfuse assistant message (#29482)#29486

fix(plugins): surface reasoning_content in Langfuse assistant message (#29482)#29486
briandevans wants to merge 3 commits into
NousResearch:mainfrom
briandevans:fix/langfuse-reasoning-content-fallback-29482

briandevans commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

swanhtet1992 commented May 20, 2026

Uh oh!

briandevans commented May 20, 2026 •

edited

Loading

Uh oh!

briandevans commented May 20, 2026

Uh oh!

briandevans commented May 24, 2026

Uh oh!

briandevans commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

briandevans commented May 20, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

swanhtet1992 commented May 20, 2026

Uh oh!

briandevans commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

briandevans commented May 20, 2026

Uh oh!

briandevans commented May 24, 2026

Uh oh!

briandevans commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

briandevans commented May 20, 2026 •

edited

Loading