fix(plugins): surface reasoning_content in Langfuse assistant message (#29482)#29486
fix(plugins): surface reasoning_content in Langfuse assistant message (#29482)#29486briandevans wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates Langfuse message serialization to surface assistant “reasoning” across providers that store it under reasoning_content, and adds regression tests to cover precedence/fallback behavior.
Changes:
- Update
_serialize_assistant_messageto prefermessage.reasoning, then fall back tomessage.reasoning_content. - Add tests validating precedence, fallback for
None/empty string, and theNormalizedResponse.provider_data["reasoning_content"]code path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/plugins/test_langfuse_plugin.py | Adds regression tests for reasoning precedence and reasoning_content fallback behavior. |
| plugins/observability/langfuse/init.py | Implements fallback to reasoning_content when reasoning is not set, so Langfuse observations include “thinking”. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Anthropic / Codex behaviour unchanged, then fall back so Langfuse's | ||
| # observation actually shows the thinking instead of `reasoning: None`. | ||
| reasoning = getattr(message, "reasoning", None) | ||
| if not reasoning: |
| return { | ||
| "content": _safe_value(getattr(message, "content", None)), | ||
| "reasoning": _safe_value(getattr(message, "reasoning", None)), | ||
| "reasoning": _safe_value(reasoning), | ||
| "tool_calls": _serialize_tool_calls(getattr(message, "tool_calls", None)), | ||
| } |
|
One case this doesn't catch: OpenRouter's unified format puts reasoning under Can send a follow-up PR for it once this merges, or fold it in here if easier. |
|
@swanhtet1992 Folded in via 90de53b03 —
Covered by four new tests:
Thanks for the precise pointer to |
|
CI audit — both
Both are already addressed by @camaragon's #29224 (canonical fix covering both files). No action needed on this PR. |
90de53b to
f80d83f
Compare
f80d83f to
552cb40
Compare
|
@copilot Both review findings addressed in d953a4b18:
CI test (1)/(6) failure is the unrelated |
d953a4b to
6a5a881
Compare
Providers that emit chain-of-thought via the `reasoning_content` convention (LM Studio, Moonshot, Qwen3 thinking, DeepSeek) leave the top-level `reasoning` field unset and stash the text on `NormalizedResponse.provider_data["reasoning_content"]`, surfaced via the property at agent/transports/types.py:115. `_serialize_assistant_message` only read `reasoning`, so every Langfuse `LLM call N` observation showed `reasoning: None` for these providers even though Hermes captured the thinking fine (visible elsewhere in the runtime via `last_reasoning` in run_agent.py). Fall back to `reasoning_content` when `reasoning` is empty/missing so the property is read on the production code path the issue reporter hits. The other two output-construction sites in the same hook (`post_llm_call` string path, `post_api_request` summary path) do not receive an assistant message object and hardcode `reasoning: None`; both are intentionally unchanged. Fixes NousResearch#29482
…earch#29482) Extend `_serialize_assistant_message` with a third fallback that walks the `reasoning_details` array (OpenRouter's unified format, `[{type, summary|thinking|content|text}, ...]`). Without this branch, OpenRouter-routed traffic still recorded `reasoning: None` even though the structured payload was present on the NormalizedResponse. Mirrors `agent.agent_runtime_helpers.extract_reasoning` precedence exactly: `reasoning` > `reasoning_content` > `reasoning_details` (per-detail key order `summary > thinking > content > text`), so the Langfuse observation matches whatever the main agent loop and `extract_content_or_reasoning` already surface. Per @swanhtet1992's review on NousResearch#29486.
Replace the truthy if not reasoning: cascade with an explicit
_reasoning_is_blank helper that only triggers on None or
blank-whitespace strings. The previous guard would have overwritten a
legitimately falsy non-string payload (0, False, {}, [])
that a provider may have placed in reasoning deliberately.
Addresses Copilot review on NousResearch#29486.
6a5a881 to
26c34a5
Compare
|
Closing to focus the queue on security/file-safety work where civilian merges are landing. Happy to reopen if maintainers want this picked up. |
What does this PR do?
The bundled
observability/langfuseplugin's_serialize_assistant_messagereads only the top-levelreasoningattribute, so providers that emit chain-of-thought via thereasoning_contentconvention (LM Studio, Moonshot, Qwen3 thinking, DeepSeek) silently lose their reasoning in Langfuse observations even though Hermes captures it correctly. EveryLLM call Nshowsreasoning: Nonefor these providers.Root cause:
NormalizedResponse(agent/transports/types.py) stores the reasoning text inprovider_data["reasoning_content"]and exposes it via thereasoning_contentproperty (line 115). The serializer never reads that property — only the top-levelreasoningfield. Fall back toreasoning_contentwhenreasoningis empty/missing so the property is read on the production code path the issue reporter hits.Mirrors the existing reasoning resolution shape used downstream — top-level
reasoning(Anthropic / Codex) wins when set,reasoning_content(OpenAI-compatiblereasoning_contentconvention) fills in for the providers that use it.Related Issue
Fixes #29482
Type of Change
Changes Made
plugins/observability/langfuse/__init__.py—_serialize_assistant_messagefalls back togetattr(message, "reasoning_content", None)when the top-levelreasoningis empty. Comment explains the two reasoning conventions.tests/plugins/test_langfuse_plugin.py— newTestSerializeAssistantMessageReasoningclass with 5 cases: top-level precedence, fallback whenreasoning is None, fallback whenreasoning == "", end-to-end via realNormalizedResponse(provider_data={"reasoning_content": ...}), and both-missing returnsNone.How to Test
uv run --with pytest --with pytest-xdist --with pytest-asyncio --with pytest-timeout python3 -m pytest tests/plugins/test_langfuse_plugin.py::TestSerializeAssistantMessageReasoning -vuv run --with pytest --with pytest-xdist --with pytest-asyncio --with pytest-timeout python3 -m pytest tests/plugins/test_langfuse_plugin.py -v→ 42 passed.AssertionError: assert '' == 'qwen3 thinking'(and similar); restoring it makes them pass — confirming the tests actually exercise the new behavior.Checklist
Code
fix(plugins):)Documentation & Housekeeping
cli-config.yaml.exampleif I added/changed config keys — N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/AScreenshots / Logs
Before (current
main):After this PR: