Skip to content

Fix/langfuse placeholder key warning#23831

Closed
xxxigm wants to merge 4 commits into
NousResearch:mainfrom
xxxigm:fix/langfuse-placeholder-key-warning
Closed

Fix/langfuse placeholder key warning#23831
xxxigm wants to merge 4 commits into
NousResearch:mainfrom
xxxigm:fix/langfuse-placeholder-key-warning

Conversation

@xxxigm

@xxxigm xxxigm commented May 11, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes the silent-failure mode reported in #23823: when an operator leaves
HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY at a template
value like placeholder, test-key, or your-langfuse-key, the
Langfuse plugin currently registers all six hooks successfully, the
startup banner reports observability as enabled, and every trace is
silently dropped
at SDK flush time. No warning, no error, just an
empty Langfuse dashboard the operator only notices hours/days later.

This PR adds a one-shot prefix-based credential check inside
_get_langfuse():

  1. After the existing empty-credentials short-circuit, both env values
    are validated against the documented Langfuse-issued prefixes
    (pk-lf- for the public key, sk-lf- for the secret) — the same
    prefixes the plugin's module docstring already advertises and the
    Langfuse server bakes into every key at issuance time.
  2. If either value doesn't match, the plugin emits a single
    logger.warning(...) naming the offending env var(s), shows a
    log-safe value preview (full string for short placeholders so the
    operator can see exactly which template they left in place; first 6
    chars + ... for anything longer so a misrouted real secret never
    hits the log), and points to the fix (set real keys or unset the
    env vars).
  3. The plugin then short-circuits via the existing _INIT_FAILED
    sentinel — the same cache the missing-credentials path already uses
    — so the warning fires once per process, not once per LLM/tool
    hook invocation.

The check sits after the Langfuse is None SDK-installed guard so
hosts without the optional langfuse SDK don't see a misleading "set
real keys" hint when the actionable fix there is pip install langfuse. Missing-credentials remains the documented opt-out path and
stays silent — no log noise for unconfigured installs.

Related Issue

Fixes #23823

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

Four commits, scoped narrowly so each is independently reviewable:

  1. fix(plugins/langfuse): add helpers to flag placeholder credentials
    plugins/observability/langfuse/__init__.py: introduces the
    _LANGFUSE_KEY_PREFIXES registry plus the _redact_key_preview() and
    _validate_langfuse_key() helpers. Pure additions, no behavior
    change yet.
  2. fix(plugins/langfuse): warn on placeholder credentials (#23823)
    plugins/observability/langfuse/__init__.py: wires the helpers
    into _get_langfuse() so a misconfigured key produces one
    consolidated warning and a cached _INIT_FAILED skip.
  3. test(plugins/langfuse): cover the placeholder-key helpers (#23823)
    tests/plugins/test_langfuse_plugin.py: 6 helper-level unit
    tests for the redaction / prefix logic (empty / short / long
    preview branches; accept / reject / unknown-name branches of the
    validator).
  4. test(plugins/langfuse): end-to-end regression for placeholder warning (#23823)
    tests/plugins/test_langfuse_plugin.py: 23 runtime tests through
    _get_langfuse() covering each placeholder position, single
    consolidated warning, _INIT_FAILED cache reuse, 8 common template
    strings drawn from real-world .env.example files, the legacy
    LANGFUSE_PUBLIC_KEY alias, both silent-skip paths (missing creds,
    missing SDK), and a positive-path test that valid pk-lf-… /
    sk-lf-… keys still proceed to SDK init.

Net diff: +357 / -8 lines (+74 / 0 production, +283 / -8 tests).

How to Test

Reproduce the bug on main (before the fix):

  1. Install the Langfuse plugin: hermes plugins enable observability/langfuse.
  2. Set placeholder credentials in ~/.hermes/.env:
    HERMES_LANGFUSE_PUBLIC_KEY=placeholder
    HERMES_LANGFUSE_SECRET_KEY=placeholder
    
  3. Start any agent session that fires hooks.
  4. Observe: no warning in stdout, no error in ~/.hermes/logs/, no
    traces on cloud.langfuse.com. The plugin appears healthy.

With this PR applied, step 3 logs (once, at first hook invocation):

WARNING plugins.observability.langfuse: Langfuse plugin: credentials
look like placeholders, traces will NOT be emitted
(HERMES_LANGFUSE_PUBLIC_KEY='placeholder' (expected 'pk-lf-' prefix);
HERMES_LANGFUSE_SECRET_KEY='placeholder' (expected 'sk-lf-' prefix)).
Set real Langfuse keys (pk-lf-... / sk-lf-...) or unset
HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY to silence
this warning.

Subsequent hook invocations stay silent (cached via _INIT_FAILED).

Automated coverage:

$ scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py
============================== 29 passed in 1.03s ==============================

To confirm the new tests actually catch the bug, revert just the
production hunk in _get_langfuse():

$ git stash push plugins/observability/langfuse/__init__.py
$ scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py
========================= 19 failed, 10 passed =========================
$ git stash pop
$ scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py
============================== 29 passed in 1.03s ==============================

The 10 that still pass without the fix are the pre-existing tests plus
the 6 helper-only unit tests (they don't go through _get_langfuse());
the 19 that flip are the end-to-end cases.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(plugins/langfuse): and test(plugins/langfuse):)
  • I searched for existing PRs to make sure this isn't a duplicate (no open PR mentions Langfuse SDK plugin: placeholder API key silent failure #23823 as of this push)
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've run scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py and all tests pass
  • I've added tests for my changes (29 total; 23 new cases for this PR, 19 of which fail on main)
  • I've tested on my platform: macOS 15.6 (Darwin 24.6.0), Python 3.12

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — the existing module docstring already documents the pk-lf-... / sk-lf-... key format this PR enforces; no additional docs needed
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A (no new config keys)
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) — pure-Python helpers, no OS-specific calls; tests are hermetic (monkeypatch + module re-import, no filesystem or network)
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A (this is a plugin runtime gate, not a tool)

Screenshots / Logs

$ scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py
4 workers [29 items]
............................. [100%]
============================== 29 passed in 1.03s ==============================

$ .venv/bin/ruff check plugins/observability/langfuse/ tests/plugins/test_langfuse_plugin.py
All checks passed!

Bug-repro proof (revert just the _get_langfuse() hunk, keep the
tests, re-run):

$ git stash push plugins/observability/langfuse/__init__.py
$ scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py::TestPlaceholderKeyDetection
========================= 19 failed, 3 passed in 1.11s =========================

The 19 failures are the end-to-end cases that exercise the
placeholder-detection warning path; the 3 that pass are the pure-helper
unit tests that never call _get_langfuse(). With the fix back in
place all 29 tests pass.

xxxigm added 4 commits May 11, 2026 21:50
Introduces two pure-Python helpers and a prefix registry that the
next commit wires into `_get_langfuse()` to surface placeholder
Langfuse credentials before the SDK silently drops every trace
(NousResearch#23823):

* `_LANGFUSE_KEY_PREFIXES` — the documented `pk-lf-` / `sk-lf-`
  prefixes Langfuse server-side issuance bakes into every public /
  secret key. Cloud and self-hosted Langfuse both use the same
  scheme, so the registry stays accurate without per-deployment
  config.
* `_redact_key_preview(value)` — log-safe preview that echoes short
  placeholder strings in full (so the operator can see exactly which
  template they forgot to replace) but truncates anything >12 chars
  to its first 6 chars + `...` (so a misrouted real secret never
  hits the log).
* `_validate_langfuse_key(env_name, value)` — returns `None` for
  values that match the registered prefix and a one-line diagnostic
  string otherwise. Unknown env-var names pass through unchanged so
  future callers can extend the registry without rewriting the call
  sites.

Helpers only — no behaviour change yet. The integration commit
follows; the test commits exercise both halves independently.
…23823)

The gateway startup banner advertises Langfuse observability is
enabled whenever `HERMES_LANGFUSE_PUBLIC_KEY` /
`HERMES_LANGFUSE_SECRET_KEY` are set, but until now the plugin would
silently swallow every trace if those values were left at template
strings like `placeholder`, `test-key`, or `your-langfuse-key`.  The
Langfuse SDK accepts arbitrary strings at construction time, queues
traces in memory, and only discovers the auth failure when the
background flush thread tries to post them — far away from any log
the operator might be watching.  The result is a false-positive
"observability is on" signal that lasts until someone notices the
Langfuse dashboard staying empty.

Wire the helpers added in the previous commit into `_get_langfuse()`
so a misconfigured key is caught at first use:

1. After the empty-credentials check, run both env values through
   `_validate_langfuse_key` to spot anything that doesn't begin with
   the Langfuse-issued `pk-lf-` / `sk-lf-` prefix.
2. If either fails, emit ONE consolidated `logger.warning` naming the
   offending env var(s) with a safe value preview and a fix hint
   (set real keys or unset the env vars), then short-circuit via the
   same `_INIT_FAILED` cache the missing-credentials path already
   uses.  That cache guarantees the warning fires once per process,
   not once per hook invocation — important because the plugin's
   hooks run on every LLM and tool call.

Missing-credentials remains silent (it's the documented opt-out path
for unconfigured installs), and the placeholder check sits after the
`Langfuse is None` short-circuit so a host without the optional
`langfuse` SDK installed doesn't see a misleading "set real keys"
hint when the actual fix is `pip install langfuse`.

Fixes NousResearch#23823.
…ch#23823)

Unit-level coverage for the two pure-Python helpers added by the
preceding commits:

* `_redact_key_preview` — exercises the three branches (empty,
  short-value echo, long-value truncate-to-6-chars).  The truncation
  case is the one that matters operationally: if an operator pastes
  a real `sk-lf-...` secret into the wrong env var, the placeholder
  warning that follows must not echo the secret in full.
* `_validate_langfuse_key` — accepts the documented prefixes,
  rejects values without them (and includes both the env-var name
  and the expected prefix in the returned diagnostic), and falls
  through unchanged for env names that don't have a registered
  prefix.

Six fast cases.  These run without the optional `langfuse` SDK
because they never call `_get_langfuse()`; the integration suite
(next commit) covers the runtime gate.
…NousResearch#23823)

Runtime-level coverage for the `_get_langfuse()` integration path
that the user-visible bug actually traverses.  Adds a `_FakeLangfuse`
stub so the tests don't need the optional `langfuse` SDK installed
— without the stub every call short-circuits at
`if Langfuse is None` before reaching the new placeholder check,
silently masking the very behaviour we want to pin.

The new class covers:

* placeholder in only the public key, only the secret key, and both
  — each asserts the warning names the right env var, shows a safe
  value preview, and does NOT leak the valid counterpart's value;
* a single consolidated warning when both keys are bad (no
  double-log);
* repeated calls re-use the `_INIT_FAILED` cache so a busy gateway
  doesn't spam the log once per LLM hook;
* 8 parametrized placeholder strings drawn from common
  `.env.example` templates (`placeholder`, `test-key`,
  `your-langfuse-key`, `change-me`, `xxx`, `dummy-key-here`,
  `<your-key>`, `REPLACE_ME`);
* the legacy bare `LANGFUSE_PUBLIC_KEY` alias is validated too —
  whichever env var the operator set, the warning still names the
  canonical HERMES_-prefixed variant in the fix hint;
* the silent-skip paths stay silent: no warning when credentials
  are missing entirely (the documented opt-out for unconfigured
  installs) and no warning when the SDK itself is absent (the
  actionable message there is `pip install langfuse`, not a key
  hint);
* valid `pk-lf-…` / `sk-lf-…` keys construct the client and never
  trip the placeholder guard — verified both by the absence of the
  warning text and by `_FakeLangfuse.instances` recording a single
  init call with the supplied kwargs.

Without the guard, 19 of the 23 new cases fail (`logger.warning`
never fires, the SDK swallows the placeholders, traces silently
disappear).  With the guard in place all 29 tests in the file pass.
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/plugins Plugin system and bundled plugins duplicate This issue or pull request already exists labels May 11, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #23188 — Same langfuse placeholder credential detection fix. #23188 is the earlier open PR addressing #22763.

kshitijk4poor added a commit that referenced this pull request May 15, 2026
…placeholder credentials (closes #22342, #22763) (#26320)

* fix(langfuse): reject placeholder credentials with one-shot warning

When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY
at a template value like 'placeholder', 'test-key', or 'your-langfuse-key',
the Langfuse SDK silently accepts the credentials at construction time and
drops every trace at flush time. No warning, no error — just an empty
Langfuse dashboard the operator only notices hours later.

Add prefix-based validation in _get_langfuse() against the documented
'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side.
Anything else fires a single warning naming the offending env var(s)
with a log-safe value preview (full string for short placeholders so the
operator knows which template they left in place; truncated for long
values so a real secret pasted into the wrong field never hits the log),
then short-circuits via the existing _INIT_FAILED cache so the warning
fires once per process, not once per hook invocation.

The check sits after the 'Langfuse is None' SDK-installed guard so hosts
without the optional langfuse SDK don't see misleading 'set real keys'
hints when the actionable fix is 'pip install langfuse'. Missing
credentials remains the documented opt-out path and stays silent — no
log noise for unconfigured installs.

Fixes #22763
Fixes #23823

* fix(langfuse): use actual API request messages for generation input

on_pre_llm_request previously used the messages kwarg alone, which
could be None when Hermes passes the payload via request_messages,
conversation_history, or user_message instead. Add _coerce_request_messages
to pick the first available list across all variants, falling back to a
synthetic user message. Generations now show the real outbound payload
rather than an empty input.

* fix(langfuse): record tool call outputs in traces

Tool observations showed input (arguments) but output was always
undefined. Root cause: when tool_call_id is empty, pre_tool_call stored
observations under a unique time-based key that post_tool_call could
never reconstruct, so every tool span was closed without output by the
_finish_trace sweep.

Fix pre/post matching by routing empty-tool_call_id tools through a
per-name FIFO queue (pending_tools_by_name) instead of the time-based
key. Tools with a tool_call_id continue to use the id-keyed dict.

Also:
 - Preserve OpenAI-style nested function shape in serialized tool calls
   so Langfuse renders name/arguments correctly
 - Keep name + tool_call_id on role:tool messages for proper pairing
 - Backfill tool results onto the matching turn_tool_calls entry so the
   generation's tool-call record carries the result alongside arguments
 - Coerce request messages from whichever field the runtime provides
   (request_messages, messages, conversation_history, user_message)

* fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test

Self-review of the combined #22345 + #23831 salvage surfaced three issues
worth fixing in the same PR rather than as follow-ups:

1. Drop is_first_turn from the pre_api_request hook. The boolean expression
   `not bool(conversation_history)` was wrong: conversation_history is
   reassigned to None mid-run after compression (5 sites in run_agent.py),
   so the value flips False -> True mid-conversation on every post-compression
   API call. The langfuse plugin never consumed it, so the kwarg was both
   misleading AND dead.

2. Replace copy.deepcopy(request_messages) with shallow list() copy. The
   pre_api_request hook contract discards return values (invoke_hook never
   writes back to api_kwargs), and the langfuse plugin's _serialize_messages
   already builds its own snapshot dicts via _safe_value. A deepcopy on every
   API call would walk every tool result and base64 image — significant
   overhead for no real isolation benefit. Shallow copy of the outer list
   protects against later mutations of api_messages without paying for the
   inner-dict walk.

3. Rename test_empty_tool_call_id_concurrent_fifo_order ->
   test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a
   real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8
   threads behind a barrier to actually exercise _STATE_LOCK on the
   pending_tools_by_name queue. The original test was sequential and only
   validated Python list semantics; this one validates the lock discipline.

4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED —
   that function does not exist. Tests reload the module via sys.modules.pop
   + importlib.import_module instead.

Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass.

---------

Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: Brian Conklin <brian@dralth.com>
@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Thanks @xxxigm — superseded by #26320 which cherry-picks your placeholder-credential fix verbatim (4 commits squashed into one logical commit, author preserved). All 23 of your regression tests landed unchanged, plus the _redact_key_preview / _validate_langfuse_key helpers.

The salvage combines your fix with @btorresgil's trace I/O fix (#22345) into one logical "Langfuse now actually works" PR, plus a self-review polish commit on top.

Closing as superseded.

DIZ-admin pushed a commit to DIZ-admin/hermes-agent that referenced this pull request May 16, 2026
…placeholder credentials (closes NousResearch#22342, NousResearch#22763) (NousResearch#26320)

* fix(langfuse): reject placeholder credentials with one-shot warning

When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY
at a template value like 'placeholder', 'test-key', or 'your-langfuse-key',
the Langfuse SDK silently accepts the credentials at construction time and
drops every trace at flush time. No warning, no error — just an empty
Langfuse dashboard the operator only notices hours later.

Add prefix-based validation in _get_langfuse() against the documented
'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side.
Anything else fires a single warning naming the offending env var(s)
with a log-safe value preview (full string for short placeholders so the
operator knows which template they left in place; truncated for long
values so a real secret pasted into the wrong field never hits the log),
then short-circuits via the existing _INIT_FAILED cache so the warning
fires once per process, not once per hook invocation.

The check sits after the 'Langfuse is None' SDK-installed guard so hosts
without the optional langfuse SDK don't see misleading 'set real keys'
hints when the actionable fix is 'pip install langfuse'. Missing
credentials remains the documented opt-out path and stays silent — no
log noise for unconfigured installs.

Fixes NousResearch#22763
Fixes NousResearch#23823

* fix(langfuse): use actual API request messages for generation input

on_pre_llm_request previously used the messages kwarg alone, which
could be None when Hermes passes the payload via request_messages,
conversation_history, or user_message instead. Add _coerce_request_messages
to pick the first available list across all variants, falling back to a
synthetic user message. Generations now show the real outbound payload
rather than an empty input.

* fix(langfuse): record tool call outputs in traces

Tool observations showed input (arguments) but output was always
undefined. Root cause: when tool_call_id is empty, pre_tool_call stored
observations under a unique time-based key that post_tool_call could
never reconstruct, so every tool span was closed without output by the
_finish_trace sweep.

Fix pre/post matching by routing empty-tool_call_id tools through a
per-name FIFO queue (pending_tools_by_name) instead of the time-based
key. Tools with a tool_call_id continue to use the id-keyed dict.

Also:
 - Preserve OpenAI-style nested function shape in serialized tool calls
   so Langfuse renders name/arguments correctly
 - Keep name + tool_call_id on role:tool messages for proper pairing
 - Backfill tool results onto the matching turn_tool_calls entry so the
   generation's tool-call record carries the result alongside arguments
 - Coerce request messages from whichever field the runtime provides
   (request_messages, messages, conversation_history, user_message)

* fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test

Self-review of the combined NousResearch#22345 + NousResearch#23831 salvage surfaced three issues
worth fixing in the same PR rather than as follow-ups:

1. Drop is_first_turn from the pre_api_request hook. The boolean expression
   `not bool(conversation_history)` was wrong: conversation_history is
   reassigned to None mid-run after compression (5 sites in run_agent.py),
   so the value flips False -> True mid-conversation on every post-compression
   API call. The langfuse plugin never consumed it, so the kwarg was both
   misleading AND dead.

2. Replace copy.deepcopy(request_messages) with shallow list() copy. The
   pre_api_request hook contract discards return values (invoke_hook never
   writes back to api_kwargs), and the langfuse plugin's _serialize_messages
   already builds its own snapshot dicts via _safe_value. A deepcopy on every
   API call would walk every tool result and base64 image — significant
   overhead for no real isolation benefit. Shallow copy of the outer list
   protects against later mutations of api_messages without paying for the
   inner-dict walk.

3. Rename test_empty_tool_call_id_concurrent_fifo_order ->
   test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a
   real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8
   threads behind a barrier to actually exercise _STATE_LOCK on the
   pending_tools_by_name queue. The original test was sequential and only
   validated Python list semantics; this one validates the lock discipline.

4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED —
   that function does not exist. Tests reload the module via sys.modules.pop
   + importlib.import_module instead.

Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass.

---------

Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: Brian Conklin <brian@dralth.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…placeholder credentials (closes NousResearch#22342, NousResearch#22763) (NousResearch#26320)

* fix(langfuse): reject placeholder credentials with one-shot warning

When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY
at a template value like 'placeholder', 'test-key', or 'your-langfuse-key',
the Langfuse SDK silently accepts the credentials at construction time and
drops every trace at flush time. No warning, no error — just an empty
Langfuse dashboard the operator only notices hours later.

Add prefix-based validation in _get_langfuse() against the documented
'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side.
Anything else fires a single warning naming the offending env var(s)
with a log-safe value preview (full string for short placeholders so the
operator knows which template they left in place; truncated for long
values so a real secret pasted into the wrong field never hits the log),
then short-circuits via the existing _INIT_FAILED cache so the warning
fires once per process, not once per hook invocation.

The check sits after the 'Langfuse is None' SDK-installed guard so hosts
without the optional langfuse SDK don't see misleading 'set real keys'
hints when the actionable fix is 'pip install langfuse'. Missing
credentials remains the documented opt-out path and stays silent — no
log noise for unconfigured installs.

Fixes NousResearch#22763
Fixes NousResearch#23823

* fix(langfuse): use actual API request messages for generation input

on_pre_llm_request previously used the messages kwarg alone, which
could be None when Hermes passes the payload via request_messages,
conversation_history, or user_message instead. Add _coerce_request_messages
to pick the first available list across all variants, falling back to a
synthetic user message. Generations now show the real outbound payload
rather than an empty input.

* fix(langfuse): record tool call outputs in traces

Tool observations showed input (arguments) but output was always
undefined. Root cause: when tool_call_id is empty, pre_tool_call stored
observations under a unique time-based key that post_tool_call could
never reconstruct, so every tool span was closed without output by the
_finish_trace sweep.

Fix pre/post matching by routing empty-tool_call_id tools through a
per-name FIFO queue (pending_tools_by_name) instead of the time-based
key. Tools with a tool_call_id continue to use the id-keyed dict.

Also:
 - Preserve OpenAI-style nested function shape in serialized tool calls
   so Langfuse renders name/arguments correctly
 - Keep name + tool_call_id on role:tool messages for proper pairing
 - Backfill tool results onto the matching turn_tool_calls entry so the
   generation's tool-call record carries the result alongside arguments
 - Coerce request messages from whichever field the runtime provides
   (request_messages, messages, conversation_history, user_message)

* fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test

Self-review of the combined NousResearch#22345 + NousResearch#23831 salvage surfaced three issues
worth fixing in the same PR rather than as follow-ups:

1. Drop is_first_turn from the pre_api_request hook. The boolean expression
   `not bool(conversation_history)` was wrong: conversation_history is
   reassigned to None mid-run after compression (5 sites in run_agent.py),
   so the value flips False -> True mid-conversation on every post-compression
   API call. The langfuse plugin never consumed it, so the kwarg was both
   misleading AND dead.

2. Replace copy.deepcopy(request_messages) with shallow list() copy. The
   pre_api_request hook contract discards return values (invoke_hook never
   writes back to api_kwargs), and the langfuse plugin's _serialize_messages
   already builds its own snapshot dicts via _safe_value. A deepcopy on every
   API call would walk every tool result and base64 image — significant
   overhead for no real isolation benefit. Shallow copy of the outer list
   protects against later mutations of api_messages without paying for the
   inner-dict walk.

3. Rename test_empty_tool_call_id_concurrent_fifo_order ->
   test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a
   real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8
   threads behind a barrier to actually exercise _STATE_LOCK on the
   pending_tools_by_name queue. The original test was sequential and only
   validated Python list semantics; this one validates the lock discipline.

4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED —
   that function does not exist. Tests reload the module via sys.modules.pop
   + importlib.import_module instead.

Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass.

---------

Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: Brian Conklin <brian@dralth.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins duplicate This issue or pull request already exists P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Langfuse SDK plugin: placeholder API key silent failure

3 participants