Fix/langfuse placeholder key warning#23831
Closed
xxxigm wants to merge 4 commits into
Closed
Conversation
Introduces two pure-Python helpers and a prefix registry that the next commit wires into `_get_langfuse()` to surface placeholder Langfuse credentials before the SDK silently drops every trace (NousResearch#23823): * `_LANGFUSE_KEY_PREFIXES` — the documented `pk-lf-` / `sk-lf-` prefixes Langfuse server-side issuance bakes into every public / secret key. Cloud and self-hosted Langfuse both use the same scheme, so the registry stays accurate without per-deployment config. * `_redact_key_preview(value)` — log-safe preview that echoes short placeholder strings in full (so the operator can see exactly which template they forgot to replace) but truncates anything >12 chars to its first 6 chars + `...` (so a misrouted real secret never hits the log). * `_validate_langfuse_key(env_name, value)` — returns `None` for values that match the registered prefix and a one-line diagnostic string otherwise. Unknown env-var names pass through unchanged so future callers can extend the registry without rewriting the call sites. Helpers only — no behaviour change yet. The integration commit follows; the test commits exercise both halves independently.
…23823) The gateway startup banner advertises Langfuse observability is enabled whenever `HERMES_LANGFUSE_PUBLIC_KEY` / `HERMES_LANGFUSE_SECRET_KEY` are set, but until now the plugin would silently swallow every trace if those values were left at template strings like `placeholder`, `test-key`, or `your-langfuse-key`. The Langfuse SDK accepts arbitrary strings at construction time, queues traces in memory, and only discovers the auth failure when the background flush thread tries to post them — far away from any log the operator might be watching. The result is a false-positive "observability is on" signal that lasts until someone notices the Langfuse dashboard staying empty. Wire the helpers added in the previous commit into `_get_langfuse()` so a misconfigured key is caught at first use: 1. After the empty-credentials check, run both env values through `_validate_langfuse_key` to spot anything that doesn't begin with the Langfuse-issued `pk-lf-` / `sk-lf-` prefix. 2. If either fails, emit ONE consolidated `logger.warning` naming the offending env var(s) with a safe value preview and a fix hint (set real keys or unset the env vars), then short-circuit via the same `_INIT_FAILED` cache the missing-credentials path already uses. That cache guarantees the warning fires once per process, not once per hook invocation — important because the plugin's hooks run on every LLM and tool call. Missing-credentials remains silent (it's the documented opt-out path for unconfigured installs), and the placeholder check sits after the `Langfuse is None` short-circuit so a host without the optional `langfuse` SDK installed doesn't see a misleading "set real keys" hint when the actual fix is `pip install langfuse`. Fixes NousResearch#23823.
…ch#23823) Unit-level coverage for the two pure-Python helpers added by the preceding commits: * `_redact_key_preview` — exercises the three branches (empty, short-value echo, long-value truncate-to-6-chars). The truncation case is the one that matters operationally: if an operator pastes a real `sk-lf-...` secret into the wrong env var, the placeholder warning that follows must not echo the secret in full. * `_validate_langfuse_key` — accepts the documented prefixes, rejects values without them (and includes both the env-var name and the expected prefix in the returned diagnostic), and falls through unchanged for env names that don't have a registered prefix. Six fast cases. These run without the optional `langfuse` SDK because they never call `_get_langfuse()`; the integration suite (next commit) covers the runtime gate.
…NousResearch#23823) Runtime-level coverage for the `_get_langfuse()` integration path that the user-visible bug actually traverses. Adds a `_FakeLangfuse` stub so the tests don't need the optional `langfuse` SDK installed — without the stub every call short-circuits at `if Langfuse is None` before reaching the new placeholder check, silently masking the very behaviour we want to pin. The new class covers: * placeholder in only the public key, only the secret key, and both — each asserts the warning names the right env var, shows a safe value preview, and does NOT leak the valid counterpart's value; * a single consolidated warning when both keys are bad (no double-log); * repeated calls re-use the `_INIT_FAILED` cache so a busy gateway doesn't spam the log once per LLM hook; * 8 parametrized placeholder strings drawn from common `.env.example` templates (`placeholder`, `test-key`, `your-langfuse-key`, `change-me`, `xxx`, `dummy-key-here`, `<your-key>`, `REPLACE_ME`); * the legacy bare `LANGFUSE_PUBLIC_KEY` alias is validated too — whichever env var the operator set, the warning still names the canonical HERMES_-prefixed variant in the fix hint; * the silent-skip paths stay silent: no warning when credentials are missing entirely (the documented opt-out for unconfigured installs) and no warning when the SDK itself is absent (the actionable message there is `pip install langfuse`, not a key hint); * valid `pk-lf-…` / `sk-lf-…` keys construct the client and never trip the placeholder guard — verified both by the absence of the warning text and by `_FakeLangfuse.instances` recording a single init call with the supplied kwargs. Without the guard, 19 of the 23 new cases fail (`logger.warning` never fires, the SDK swallows the placeholders, traces silently disappear). With the guard in place all 29 tests in the file pass.
Collaborator
3 tasks
kshitijk4poor
added a commit
that referenced
this pull request
May 15, 2026
…placeholder credentials (closes #22342, #22763) (#26320) * fix(langfuse): reject placeholder credentials with one-shot warning When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY at a template value like 'placeholder', 'test-key', or 'your-langfuse-key', the Langfuse SDK silently accepts the credentials at construction time and drops every trace at flush time. No warning, no error — just an empty Langfuse dashboard the operator only notices hours later. Add prefix-based validation in _get_langfuse() against the documented 'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side. Anything else fires a single warning naming the offending env var(s) with a log-safe value preview (full string for short placeholders so the operator knows which template they left in place; truncated for long values so a real secret pasted into the wrong field never hits the log), then short-circuits via the existing _INIT_FAILED cache so the warning fires once per process, not once per hook invocation. The check sits after the 'Langfuse is None' SDK-installed guard so hosts without the optional langfuse SDK don't see misleading 'set real keys' hints when the actionable fix is 'pip install langfuse'. Missing credentials remains the documented opt-out path and stays silent — no log noise for unconfigured installs. Fixes #22763 Fixes #23823 * fix(langfuse): use actual API request messages for generation input on_pre_llm_request previously used the messages kwarg alone, which could be None when Hermes passes the payload via request_messages, conversation_history, or user_message instead. Add _coerce_request_messages to pick the first available list across all variants, falling back to a synthetic user message. Generations now show the real outbound payload rather than an empty input. * fix(langfuse): record tool call outputs in traces Tool observations showed input (arguments) but output was always undefined. Root cause: when tool_call_id is empty, pre_tool_call stored observations under a unique time-based key that post_tool_call could never reconstruct, so every tool span was closed without output by the _finish_trace sweep. Fix pre/post matching by routing empty-tool_call_id tools through a per-name FIFO queue (pending_tools_by_name) instead of the time-based key. Tools with a tool_call_id continue to use the id-keyed dict. Also: - Preserve OpenAI-style nested function shape in serialized tool calls so Langfuse renders name/arguments correctly - Keep name + tool_call_id on role:tool messages for proper pairing - Backfill tool results onto the matching turn_tool_calls entry so the generation's tool-call record carries the result alongside arguments - Coerce request messages from whichever field the runtime provides (request_messages, messages, conversation_history, user_message) * fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test Self-review of the combined #22345 + #23831 salvage surfaced three issues worth fixing in the same PR rather than as follow-ups: 1. Drop is_first_turn from the pre_api_request hook. The boolean expression `not bool(conversation_history)` was wrong: conversation_history is reassigned to None mid-run after compression (5 sites in run_agent.py), so the value flips False -> True mid-conversation on every post-compression API call. The langfuse plugin never consumed it, so the kwarg was both misleading AND dead. 2. Replace copy.deepcopy(request_messages) with shallow list() copy. The pre_api_request hook contract discards return values (invoke_hook never writes back to api_kwargs), and the langfuse plugin's _serialize_messages already builds its own snapshot dicts via _safe_value. A deepcopy on every API call would walk every tool result and base64 image — significant overhead for no real isolation benefit. Shallow copy of the outer list protects against later mutations of api_messages without paying for the inner-dict walk. 3. Rename test_empty_tool_call_id_concurrent_fifo_order -> test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8 threads behind a barrier to actually exercise _STATE_LOCK on the pending_tools_by_name queue. The original test was sequential and only validated Python list semantics; this one validates the lock discipline. 4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED — that function does not exist. Tests reload the module via sys.modules.pop + importlib.import_module instead. Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass. --------- Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: Brian Conklin <brian@dralth.com>
Closed
15 tasks
Collaborator
|
Thanks @xxxigm — superseded by #26320 which cherry-picks your placeholder-credential fix verbatim (4 commits squashed into one logical commit, author preserved). All 23 of your regression tests landed unchanged, plus the The salvage combines your fix with @btorresgil's trace I/O fix (#22345) into one logical "Langfuse now actually works" PR, plus a self-review polish commit on top. Closing as superseded. |
19 tasks
DIZ-admin
pushed a commit
to DIZ-admin/hermes-agent
that referenced
this pull request
May 16, 2026
…placeholder credentials (closes NousResearch#22342, NousResearch#22763) (NousResearch#26320) * fix(langfuse): reject placeholder credentials with one-shot warning When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY at a template value like 'placeholder', 'test-key', or 'your-langfuse-key', the Langfuse SDK silently accepts the credentials at construction time and drops every trace at flush time. No warning, no error — just an empty Langfuse dashboard the operator only notices hours later. Add prefix-based validation in _get_langfuse() against the documented 'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side. Anything else fires a single warning naming the offending env var(s) with a log-safe value preview (full string for short placeholders so the operator knows which template they left in place; truncated for long values so a real secret pasted into the wrong field never hits the log), then short-circuits via the existing _INIT_FAILED cache so the warning fires once per process, not once per hook invocation. The check sits after the 'Langfuse is None' SDK-installed guard so hosts without the optional langfuse SDK don't see misleading 'set real keys' hints when the actionable fix is 'pip install langfuse'. Missing credentials remains the documented opt-out path and stays silent — no log noise for unconfigured installs. Fixes NousResearch#22763 Fixes NousResearch#23823 * fix(langfuse): use actual API request messages for generation input on_pre_llm_request previously used the messages kwarg alone, which could be None when Hermes passes the payload via request_messages, conversation_history, or user_message instead. Add _coerce_request_messages to pick the first available list across all variants, falling back to a synthetic user message. Generations now show the real outbound payload rather than an empty input. * fix(langfuse): record tool call outputs in traces Tool observations showed input (arguments) but output was always undefined. Root cause: when tool_call_id is empty, pre_tool_call stored observations under a unique time-based key that post_tool_call could never reconstruct, so every tool span was closed without output by the _finish_trace sweep. Fix pre/post matching by routing empty-tool_call_id tools through a per-name FIFO queue (pending_tools_by_name) instead of the time-based key. Tools with a tool_call_id continue to use the id-keyed dict. Also: - Preserve OpenAI-style nested function shape in serialized tool calls so Langfuse renders name/arguments correctly - Keep name + tool_call_id on role:tool messages for proper pairing - Backfill tool results onto the matching turn_tool_calls entry so the generation's tool-call record carries the result alongside arguments - Coerce request messages from whichever field the runtime provides (request_messages, messages, conversation_history, user_message) * fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test Self-review of the combined NousResearch#22345 + NousResearch#23831 salvage surfaced three issues worth fixing in the same PR rather than as follow-ups: 1. Drop is_first_turn from the pre_api_request hook. The boolean expression `not bool(conversation_history)` was wrong: conversation_history is reassigned to None mid-run after compression (5 sites in run_agent.py), so the value flips False -> True mid-conversation on every post-compression API call. The langfuse plugin never consumed it, so the kwarg was both misleading AND dead. 2. Replace copy.deepcopy(request_messages) with shallow list() copy. The pre_api_request hook contract discards return values (invoke_hook never writes back to api_kwargs), and the langfuse plugin's _serialize_messages already builds its own snapshot dicts via _safe_value. A deepcopy on every API call would walk every tool result and base64 image — significant overhead for no real isolation benefit. Shallow copy of the outer list protects against later mutations of api_messages without paying for the inner-dict walk. 3. Rename test_empty_tool_call_id_concurrent_fifo_order -> test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8 threads behind a barrier to actually exercise _STATE_LOCK on the pending_tools_by_name queue. The original test was sequential and only validated Python list semantics; this one validates the lock discipline. 4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED — that function does not exist. Tests reload the module via sys.modules.pop + importlib.import_module instead. Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass. --------- Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: Brian Conklin <brian@dralth.com>
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…placeholder credentials (closes NousResearch#22342, NousResearch#22763) (NousResearch#26320) * fix(langfuse): reject placeholder credentials with one-shot warning When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY at a template value like 'placeholder', 'test-key', or 'your-langfuse-key', the Langfuse SDK silently accepts the credentials at construction time and drops every trace at flush time. No warning, no error — just an empty Langfuse dashboard the operator only notices hours later. Add prefix-based validation in _get_langfuse() against the documented 'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side. Anything else fires a single warning naming the offending env var(s) with a log-safe value preview (full string for short placeholders so the operator knows which template they left in place; truncated for long values so a real secret pasted into the wrong field never hits the log), then short-circuits via the existing _INIT_FAILED cache so the warning fires once per process, not once per hook invocation. The check sits after the 'Langfuse is None' SDK-installed guard so hosts without the optional langfuse SDK don't see misleading 'set real keys' hints when the actionable fix is 'pip install langfuse'. Missing credentials remains the documented opt-out path and stays silent — no log noise for unconfigured installs. Fixes NousResearch#22763 Fixes NousResearch#23823 * fix(langfuse): use actual API request messages for generation input on_pre_llm_request previously used the messages kwarg alone, which could be None when Hermes passes the payload via request_messages, conversation_history, or user_message instead. Add _coerce_request_messages to pick the first available list across all variants, falling back to a synthetic user message. Generations now show the real outbound payload rather than an empty input. * fix(langfuse): record tool call outputs in traces Tool observations showed input (arguments) but output was always undefined. Root cause: when tool_call_id is empty, pre_tool_call stored observations under a unique time-based key that post_tool_call could never reconstruct, so every tool span was closed without output by the _finish_trace sweep. Fix pre/post matching by routing empty-tool_call_id tools through a per-name FIFO queue (pending_tools_by_name) instead of the time-based key. Tools with a tool_call_id continue to use the id-keyed dict. Also: - Preserve OpenAI-style nested function shape in serialized tool calls so Langfuse renders name/arguments correctly - Keep name + tool_call_id on role:tool messages for proper pairing - Backfill tool results onto the matching turn_tool_calls entry so the generation's tool-call record carries the result alongside arguments - Coerce request messages from whichever field the runtime provides (request_messages, messages, conversation_history, user_message) * fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test Self-review of the combined NousResearch#22345 + NousResearch#23831 salvage surfaced three issues worth fixing in the same PR rather than as follow-ups: 1. Drop is_first_turn from the pre_api_request hook. The boolean expression `not bool(conversation_history)` was wrong: conversation_history is reassigned to None mid-run after compression (5 sites in run_agent.py), so the value flips False -> True mid-conversation on every post-compression API call. The langfuse plugin never consumed it, so the kwarg was both misleading AND dead. 2. Replace copy.deepcopy(request_messages) with shallow list() copy. The pre_api_request hook contract discards return values (invoke_hook never writes back to api_kwargs), and the langfuse plugin's _serialize_messages already builds its own snapshot dicts via _safe_value. A deepcopy on every API call would walk every tool result and base64 image — significant overhead for no real isolation benefit. Shallow copy of the outer list protects against later mutations of api_messages without paying for the inner-dict walk. 3. Rename test_empty_tool_call_id_concurrent_fifo_order -> test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8 threads behind a barrier to actually exercise _STATE_LOCK on the pending_tools_by_name queue. The original test was sequential and only validated Python list semantics; this one validates the lock discipline. 4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED — that function does not exist. Tests reload the module via sys.modules.pop + importlib.import_module instead. Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass. --------- Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: Brian Conklin <brian@dralth.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes the silent-failure mode reported in #23823: when an operator leaves
HERMES_LANGFUSE_PUBLIC_KEY/HERMES_LANGFUSE_SECRET_KEYat a templatevalue like
placeholder,test-key, oryour-langfuse-key, theLangfuse plugin currently registers all six hooks successfully, the
startup banner reports observability as enabled, and every trace is
silently dropped at SDK flush time. No warning, no error, just an
empty Langfuse dashboard the operator only notices hours/days later.
This PR adds a one-shot prefix-based credential check inside
_get_langfuse():are validated against the documented Langfuse-issued prefixes
(
pk-lf-for the public key,sk-lf-for the secret) — the sameprefixes the plugin's module docstring already advertises and the
Langfuse server bakes into every key at issuance time.
logger.warning(...)naming the offending env var(s), shows alog-safe value preview (full string for short placeholders so the
operator can see exactly which template they left in place; first 6
chars +
...for anything longer so a misrouted real secret neverhits the log), and points to the fix (set real keys or unset the
env vars).
_INIT_FAILEDsentinel — the same cache the missing-credentials path already uses
— so the warning fires once per process, not once per LLM/tool
hook invocation.
The check sits after the
Langfuse is NoneSDK-installed guard sohosts without the optional
langfuseSDK don't see a misleading "setreal keys" hint when the actionable fix there is
pip install langfuse. Missing-credentials remains the documented opt-out path andstays silent — no log noise for unconfigured installs.
Related Issue
Fixes #23823
Type of Change
Changes Made
Four commits, scoped narrowly so each is independently reviewable:
fix(plugins/langfuse): add helpers to flag placeholder credentials—
plugins/observability/langfuse/__init__.py: introduces the_LANGFUSE_KEY_PREFIXESregistry plus the_redact_key_preview()and_validate_langfuse_key()helpers. Pure additions, no behaviorchange yet.
fix(plugins/langfuse): warn on placeholder credentials (#23823)—
plugins/observability/langfuse/__init__.py: wires the helpersinto
_get_langfuse()so a misconfigured key produces oneconsolidated warning and a cached
_INIT_FAILEDskip.test(plugins/langfuse): cover the placeholder-key helpers (#23823)—
tests/plugins/test_langfuse_plugin.py: 6 helper-level unittests for the redaction / prefix logic (empty / short / long
preview branches; accept / reject / unknown-name branches of the
validator).
test(plugins/langfuse): end-to-end regression for placeholder warning (#23823)—
tests/plugins/test_langfuse_plugin.py: 23 runtime tests through_get_langfuse()covering each placeholder position, singleconsolidated warning,
_INIT_FAILEDcache reuse, 8 common templatestrings drawn from real-world
.env.examplefiles, the legacyLANGFUSE_PUBLIC_KEYalias, both silent-skip paths (missing creds,missing SDK), and a positive-path test that valid
pk-lf-…/sk-lf-…keys still proceed to SDK init.Net diff:
+357 / -8lines (+74 / 0production,+283 / -8tests).How to Test
Reproduce the bug on
main(before the fix):hermes plugins enable observability/langfuse.~/.hermes/.env:~/.hermes/logs/, notraces on
cloud.langfuse.com. The plugin appears healthy.With this PR applied, step 3 logs (once, at first hook invocation):
Subsequent hook invocations stay silent (cached via
_INIT_FAILED).Automated coverage:
To confirm the new tests actually catch the bug, revert just the
production hunk in
_get_langfuse():The 10 that still pass without the fix are the pre-existing tests plus
the 6 helper-only unit tests (they don't go through
_get_langfuse());the 19 that flip are the end-to-end cases.
Checklist
Code
fix(plugins/langfuse):andtest(plugins/langfuse):)scripts/run_tests.sh tests/plugins/test_langfuse_plugin.pyand all tests passmain)Documentation & Housekeeping
docs/, docstrings) — the existing module docstring already documents thepk-lf-.../sk-lf-...key format this PR enforces; no additional docs neededcli-config.yaml.exampleif I added/changed config keys — N/A (no new config keys)CONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/Amonkeypatch+ module re-import, no filesystem or network)Screenshots / Logs
Bug-repro proof (revert just the
_get_langfuse()hunk, keep thetests, re-run):
The 19 failures are the end-to-end cases that exercise the
placeholder-detection warning path; the 3 that pass are the pure-helper
unit tests that never call
_get_langfuse(). With the fix back inplace all 29 tests pass.