Fix/langfuse placeholder key warning by xxxigm · Pull Request #23831 · NousResearch/hermes-agent

xxxigm · 2026-05-11T14:54:41Z

What does this PR do?

Fixes the silent-failure mode reported in #23823: when an operator leaves
HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY at a template
value like placeholder, test-key, or your-langfuse-key, the
Langfuse plugin currently registers all six hooks successfully, the
startup banner reports observability as enabled, and every trace is
silently dropped at SDK flush time. No warning, no error, just an
empty Langfuse dashboard the operator only notices hours/days later.

This PR adds a one-shot prefix-based credential check inside
_get_langfuse():

After the existing empty-credentials short-circuit, both env values
are validated against the documented Langfuse-issued prefixes
(pk-lf- for the public key, sk-lf- for the secret) — the same
prefixes the plugin's module docstring already advertises and the
Langfuse server bakes into every key at issuance time.
If either value doesn't match, the plugin emits a single
logger.warning(...) naming the offending env var(s), shows a
log-safe value preview (full string for short placeholders so the
operator can see exactly which template they left in place; first 6
chars + ... for anything longer so a misrouted real secret never
hits the log), and points to the fix (set real keys or unset the
env vars).
The plugin then short-circuits via the existing _INIT_FAILED
sentinel — the same cache the missing-credentials path already uses
— so the warning fires once per process, not once per LLM/tool
hook invocation.

The check sits after the Langfuse is None SDK-installed guard so
hosts without the optional langfuse SDK don't see a misleading "set
real keys" hint when the actionable fix there is pip install langfuse. Missing-credentials remains the documented opt-out path and
stays silent — no log noise for unconfigured installs.

Related Issue

Fixes #23823

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

Four commits, scoped narrowly so each is independently reviewable:

fix(plugins/langfuse): add helpers to flag placeholder credentials
— plugins/observability/langfuse/__init__.py: introduces the
_LANGFUSE_KEY_PREFIXES registry plus the _redact_key_preview() and
_validate_langfuse_key() helpers. Pure additions, no behavior
change yet.
fix(plugins/langfuse): warn on placeholder credentials (#23823)
— plugins/observability/langfuse/__init__.py: wires the helpers
into _get_langfuse() so a misconfigured key produces one
consolidated warning and a cached _INIT_FAILED skip.
test(plugins/langfuse): cover the placeholder-key helpers (#23823)
— tests/plugins/test_langfuse_plugin.py: 6 helper-level unit
tests for the redaction / prefix logic (empty / short / long
preview branches; accept / reject / unknown-name branches of the
validator).
test(plugins/langfuse): end-to-end regression for placeholder warning (#23823)
— tests/plugins/test_langfuse_plugin.py: 23 runtime tests through
_get_langfuse() covering each placeholder position, single
consolidated warning, _INIT_FAILED cache reuse, 8 common template
strings drawn from real-world .env.example files, the legacy
LANGFUSE_PUBLIC_KEY alias, both silent-skip paths (missing creds,
missing SDK), and a positive-path test that valid pk-lf-… /
sk-lf-… keys still proceed to SDK init.

Net diff: +357 / -8 lines (+74 / 0 production, +283 / -8 tests).

How to Test

Reproduce the bug on main (before the fix):

Install the Langfuse plugin: hermes plugins enable observability/langfuse.

Set placeholder credentials in ~/.hermes/.env:

HERMES_LANGFUSE_PUBLIC_KEY=placeholder
HERMES_LANGFUSE_SECRET_KEY=placeholder

Start any agent session that fires hooks.
Observe: no warning in stdout, no error in ~/.hermes/logs/, no
traces on cloud.langfuse.com. The plugin appears healthy.

With this PR applied, step 3 logs (once, at first hook invocation):

WARNING plugins.observability.langfuse: Langfuse plugin: credentials
look like placeholders, traces will NOT be emitted
(HERMES_LANGFUSE_PUBLIC_KEY='placeholder' (expected 'pk-lf-' prefix);
HERMES_LANGFUSE_SECRET_KEY='placeholder' (expected 'sk-lf-' prefix)).
Set real Langfuse keys (pk-lf-... / sk-lf-...) or unset
HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY to silence
this warning.

Subsequent hook invocations stay silent (cached via _INIT_FAILED).

Automated coverage:

$ scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py
============================== 29 passed in 1.03s ==============================

To confirm the new tests actually catch the bug, revert just the
production hunk in _get_langfuse():

$ git stash push plugins/observability/langfuse/__init__.py
$ scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py
========================= 19 failed, 10 passed =========================
$ git stash pop
$ scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py
============================== 29 passed in 1.03s ==============================

The 10 that still pass without the fix are the pre-existing tests plus
the 6 helper-only unit tests (they don't go through _get_langfuse());
the 19 that flip are the end-to-end cases.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(plugins/langfuse): and test(plugins/langfuse):)
I searched for existing PRs to make sure this isn't a duplicate (no open PR mentions Langfuse SDK plugin: placeholder API key silent failure #23823 as of this push)
My PR contains only changes related to this fix (no unrelated commits)
I've run scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py and all tests pass
I've added tests for my changes (29 total; 23 new cases for this PR, 19 of which fail on main)
I've tested on my platform: macOS 15.6 (Darwin 24.6.0), Python 3.12

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — the existing module docstring already documents the pk-lf-... / sk-lf-... key format this PR enforces; no additional docs needed
I've updated cli-config.yaml.example if I added/changed config keys — N/A (no new config keys)
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
I've considered cross-platform impact (Windows, macOS) — pure-Python helpers, no OS-specific calls; tests are hermetic (monkeypatch + module re-import, no filesystem or network)
I've updated tool descriptions/schemas if I changed tool behavior — N/A (this is a plugin runtime gate, not a tool)

Screenshots / Logs

$ scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py
4 workers [29 items]
............................. [100%]
============================== 29 passed in 1.03s ==============================

$ .venv/bin/ruff check plugins/observability/langfuse/ tests/plugins/test_langfuse_plugin.py
All checks passed!

Bug-repro proof (revert just the _get_langfuse() hunk, keep the
tests, re-run):

$ git stash push plugins/observability/langfuse/__init__.py
$ scripts/run_tests.sh tests/plugins/test_langfuse_plugin.py::TestPlaceholderKeyDetection
========================= 19 failed, 3 passed in 1.11s =========================

The 19 failures are the end-to-end cases that exercise the
placeholder-detection warning path; the 3 that pass are the pure-helper
unit tests that never call _get_langfuse(). With the fix back in
place all 29 tests pass.

Introduces two pure-Python helpers and a prefix registry that the next commit wires into `_get_langfuse()` to surface placeholder Langfuse credentials before the SDK silently drops every trace (NousResearch#23823): * `_LANGFUSE_KEY_PREFIXES` — the documented `pk-lf-` / `sk-lf-` prefixes Langfuse server-side issuance bakes into every public / secret key. Cloud and self-hosted Langfuse both use the same scheme, so the registry stays accurate without per-deployment config. * `_redact_key_preview(value)` — log-safe preview that echoes short placeholder strings in full (so the operator can see exactly which template they forgot to replace) but truncates anything >12 chars to its first 6 chars + `...` (so a misrouted real secret never hits the log). * `_validate_langfuse_key(env_name, value)` — returns `None` for values that match the registered prefix and a one-line diagnostic string otherwise. Unknown env-var names pass through unchanged so future callers can extend the registry without rewriting the call sites. Helpers only — no behaviour change yet. The integration commit follows; the test commits exercise both halves independently.

…23823) The gateway startup banner advertises Langfuse observability is enabled whenever `HERMES_LANGFUSE_PUBLIC_KEY` / `HERMES_LANGFUSE_SECRET_KEY` are set, but until now the plugin would silently swallow every trace if those values were left at template strings like `placeholder`, `test-key`, or `your-langfuse-key`. The Langfuse SDK accepts arbitrary strings at construction time, queues traces in memory, and only discovers the auth failure when the background flush thread tries to post them — far away from any log the operator might be watching. The result is a false-positive "observability is on" signal that lasts until someone notices the Langfuse dashboard staying empty. Wire the helpers added in the previous commit into `_get_langfuse()` so a misconfigured key is caught at first use: 1. After the empty-credentials check, run both env values through `_validate_langfuse_key` to spot anything that doesn't begin with the Langfuse-issued `pk-lf-` / `sk-lf-` prefix. 2. If either fails, emit ONE consolidated `logger.warning` naming the offending env var(s) with a safe value preview and a fix hint (set real keys or unset the env vars), then short-circuit via the same `_INIT_FAILED` cache the missing-credentials path already uses. That cache guarantees the warning fires once per process, not once per hook invocation — important because the plugin's hooks run on every LLM and tool call. Missing-credentials remains silent (it's the documented opt-out path for unconfigured installs), and the placeholder check sits after the `Langfuse is None` short-circuit so a host without the optional `langfuse` SDK installed doesn't see a misleading "set real keys" hint when the actual fix is `pip install langfuse`. Fixes NousResearch#23823.

…ch#23823) Unit-level coverage for the two pure-Python helpers added by the preceding commits: * `_redact_key_preview` — exercises the three branches (empty, short-value echo, long-value truncate-to-6-chars). The truncation case is the one that matters operationally: if an operator pastes a real `sk-lf-...` secret into the wrong env var, the placeholder warning that follows must not echo the secret in full. * `_validate_langfuse_key` — accepts the documented prefixes, rejects values without them (and includes both the env-var name and the expected prefix in the returned diagnostic), and falls through unchanged for env names that don't have a registered prefix. Six fast cases. These run without the optional `langfuse` SDK because they never call `_get_langfuse()`; the integration suite (next commit) covers the runtime gate.

…NousResearch#23823) Runtime-level coverage for the `_get_langfuse()` integration path that the user-visible bug actually traverses. Adds a `_FakeLangfuse` stub so the tests don't need the optional `langfuse` SDK installed — without the stub every call short-circuits at `if Langfuse is None` before reaching the new placeholder check, silently masking the very behaviour we want to pin. The new class covers: * placeholder in only the public key, only the secret key, and both — each asserts the warning names the right env var, shows a safe value preview, and does NOT leak the valid counterpart's value; * a single consolidated warning when both keys are bad (no double-log); * repeated calls re-use the `_INIT_FAILED` cache so a busy gateway doesn't spam the log once per LLM hook; * 8 parametrized placeholder strings drawn from common `.env.example` templates (`placeholder`, `test-key`, `your-langfuse-key`, `change-me`, `xxx`, `dummy-key-here`, `<your-key>`, `REPLACE_ME`); * the legacy bare `LANGFUSE_PUBLIC_KEY` alias is validated too — whichever env var the operator set, the warning still names the canonical HERMES_-prefixed variant in the fix hint; * the silent-skip paths stay silent: no warning when credentials are missing entirely (the documented opt-out for unconfigured installs) and no warning when the SDK itself is absent (the actionable message there is `pip install langfuse`, not a key hint); * valid `pk-lf-…` / `sk-lf-…` keys construct the client and never trip the placeholder guard — verified both by the absence of the warning text and by `_FakeLangfuse.instances` recording a single init call with the supplied kwargs. Without the guard, 19 of the 23 new cases fail (`logger.warning` never fires, the SDK swallows the placeholders, traces silently disappear). With the guard in place all 29 tests in the file pass.

alt-glitch · 2026-05-11T15:08:10Z

Duplicate of #23188 — Same langfuse placeholder credential detection fix. #23188 is the earlier open PR addressing #22763.

…placeholder credentials (closes #22342, #22763) (#26320) * fix(langfuse): reject placeholder credentials with one-shot warning When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY at a template value like 'placeholder', 'test-key', or 'your-langfuse-key', the Langfuse SDK silently accepts the credentials at construction time and drops every trace at flush time. No warning, no error — just an empty Langfuse dashboard the operator only notices hours later. Add prefix-based validation in _get_langfuse() against the documented 'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side. Anything else fires a single warning naming the offending env var(s) with a log-safe value preview (full string for short placeholders so the operator knows which template they left in place; truncated for long values so a real secret pasted into the wrong field never hits the log), then short-circuits via the existing _INIT_FAILED cache so the warning fires once per process, not once per hook invocation. The check sits after the 'Langfuse is None' SDK-installed guard so hosts without the optional langfuse SDK don't see misleading 'set real keys' hints when the actionable fix is 'pip install langfuse'. Missing credentials remains the documented opt-out path and stays silent — no log noise for unconfigured installs. Fixes #22763 Fixes #23823 * fix(langfuse): use actual API request messages for generation input on_pre_llm_request previously used the messages kwarg alone, which could be None when Hermes passes the payload via request_messages, conversation_history, or user_message instead. Add _coerce_request_messages to pick the first available list across all variants, falling back to a synthetic user message. Generations now show the real outbound payload rather than an empty input. * fix(langfuse): record tool call outputs in traces Tool observations showed input (arguments) but output was always undefined. Root cause: when tool_call_id is empty, pre_tool_call stored observations under a unique time-based key that post_tool_call could never reconstruct, so every tool span was closed without output by the _finish_trace sweep. Fix pre/post matching by routing empty-tool_call_id tools through a per-name FIFO queue (pending_tools_by_name) instead of the time-based key. Tools with a tool_call_id continue to use the id-keyed dict. Also: - Preserve OpenAI-style nested function shape in serialized tool calls so Langfuse renders name/arguments correctly - Keep name + tool_call_id on role:tool messages for proper pairing - Backfill tool results onto the matching turn_tool_calls entry so the generation's tool-call record carries the result alongside arguments - Coerce request messages from whichever field the runtime provides (request_messages, messages, conversation_history, user_message) * fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test Self-review of the combined #22345 + #23831 salvage surfaced three issues worth fixing in the same PR rather than as follow-ups: 1. Drop is_first_turn from the pre_api_request hook. The boolean expression `not bool(conversation_history)` was wrong: conversation_history is reassigned to None mid-run after compression (5 sites in run_agent.py), so the value flips False -> True mid-conversation on every post-compression API call. The langfuse plugin never consumed it, so the kwarg was both misleading AND dead. 2. Replace copy.deepcopy(request_messages) with shallow list() copy. The pre_api_request hook contract discards return values (invoke_hook never writes back to api_kwargs), and the langfuse plugin's _serialize_messages already builds its own snapshot dicts via _safe_value. A deepcopy on every API call would walk every tool result and base64 image — significant overhead for no real isolation benefit. Shallow copy of the outer list protects against later mutations of api_messages without paying for the inner-dict walk. 3. Rename test_empty_tool_call_id_concurrent_fifo_order -> test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8 threads behind a barrier to actually exercise _STATE_LOCK on the pending_tools_by_name queue. The original test was sequential and only validated Python list semantics; this one validates the lock discipline. 4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED — that function does not exist. Tests reload the module via sys.modules.pop + importlib.import_module instead. Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass. --------- Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: Brian Conklin <brian@dralth.com>

kshitijk4poor · 2026-05-15T12:05:33Z

Thanks @xxxigm — superseded by #26320 which cherry-picks your placeholder-credential fix verbatim (4 commits squashed into one logical commit, author preserved). All 23 of your regression tests landed unchanged, plus the _redact_key_preview / _validate_langfuse_key helpers.

The salvage combines your fix with @btorresgil's trace I/O fix (#22345) into one logical "Langfuse now actually works" PR, plus a self-review polish commit on top.

Closing as superseded.

…placeholder credentials (closes NousResearch#22342, NousResearch#22763) (NousResearch#26320) * fix(langfuse): reject placeholder credentials with one-shot warning When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY at a template value like 'placeholder', 'test-key', or 'your-langfuse-key', the Langfuse SDK silently accepts the credentials at construction time and drops every trace at flush time. No warning, no error — just an empty Langfuse dashboard the operator only notices hours later. Add prefix-based validation in _get_langfuse() against the documented 'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side. Anything else fires a single warning naming the offending env var(s) with a log-safe value preview (full string for short placeholders so the operator knows which template they left in place; truncated for long values so a real secret pasted into the wrong field never hits the log), then short-circuits via the existing _INIT_FAILED cache so the warning fires once per process, not once per hook invocation. The check sits after the 'Langfuse is None' SDK-installed guard so hosts without the optional langfuse SDK don't see misleading 'set real keys' hints when the actionable fix is 'pip install langfuse'. Missing credentials remains the documented opt-out path and stays silent — no log noise for unconfigured installs. Fixes NousResearch#22763 Fixes NousResearch#23823 * fix(langfuse): use actual API request messages for generation input on_pre_llm_request previously used the messages kwarg alone, which could be None when Hermes passes the payload via request_messages, conversation_history, or user_message instead. Add _coerce_request_messages to pick the first available list across all variants, falling back to a synthetic user message. Generations now show the real outbound payload rather than an empty input. * fix(langfuse): record tool call outputs in traces Tool observations showed input (arguments) but output was always undefined. Root cause: when tool_call_id is empty, pre_tool_call stored observations under a unique time-based key that post_tool_call could never reconstruct, so every tool span was closed without output by the _finish_trace sweep. Fix pre/post matching by routing empty-tool_call_id tools through a per-name FIFO queue (pending_tools_by_name) instead of the time-based key. Tools with a tool_call_id continue to use the id-keyed dict. Also: - Preserve OpenAI-style nested function shape in serialized tool calls so Langfuse renders name/arguments correctly - Keep name + tool_call_id on role:tool messages for proper pairing - Backfill tool results onto the matching turn_tool_calls entry so the generation's tool-call record carries the result alongside arguments - Coerce request messages from whichever field the runtime provides (request_messages, messages, conversation_history, user_message) * fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test Self-review of the combined NousResearch#22345 + NousResearch#23831 salvage surfaced three issues worth fixing in the same PR rather than as follow-ups: 1. Drop is_first_turn from the pre_api_request hook. The boolean expression `not bool(conversation_history)` was wrong: conversation_history is reassigned to None mid-run after compression (5 sites in run_agent.py), so the value flips False -> True mid-conversation on every post-compression API call. The langfuse plugin never consumed it, so the kwarg was both misleading AND dead. 2. Replace copy.deepcopy(request_messages) with shallow list() copy. The pre_api_request hook contract discards return values (invoke_hook never writes back to api_kwargs), and the langfuse plugin's _serialize_messages already builds its own snapshot dicts via _safe_value. A deepcopy on every API call would walk every tool result and base64 image — significant overhead for no real isolation benefit. Shallow copy of the outer list protects against later mutations of api_messages without paying for the inner-dict walk. 3. Rename test_empty_tool_call_id_concurrent_fifo_order -> test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8 threads behind a barrier to actually exercise _STATE_LOCK on the pending_tools_by_name queue. The original test was sequential and only validated Python list semantics; this one validates the lock discipline. 4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED — that function does not exist. Tests reload the module via sys.modules.pop + importlib.import_module instead. Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass. --------- Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: Brian Conklin <brian@dralth.com>

xxxigm added 4 commits May 11, 2026 21:50

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/plugins Plugin system and bundled plugins duplicate This issue or pull request already exists labels May 11, 2026

kshitijk4poor mentioned this pull request May 15, 2026

fix(langfuse): complete observability fix — trace I/O, tool outputs, placeholder credentials (closes #22342, #22763) #26320

Merged

3 tasks

kshitijk4poor mentioned this pull request May 15, 2026

fix(langfuse): fixes langfuse observability - llm input/output and tool output missing #22345

Closed

15 tasks

kshitijk4poor closed this May 15, 2026

kshitijk4poor mentioned this pull request May 15, 2026

fix(langfuse): reject placeholder credentials #23188

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/langfuse placeholder key warning#23831

Fix/langfuse placeholder key warning#23831
xxxigm wants to merge 4 commits into
NousResearch:mainfrom
xxxigm:fix/langfuse-placeholder-key-warning

xxxigm commented May 11, 2026

Uh oh!

alt-glitch commented May 11, 2026

Uh oh!

kshitijk4poor commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xxxigm commented May 11, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Uh oh!

alt-glitch commented May 11, 2026

Uh oh!

kshitijk4poor commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants