feat(telemetry): comprehensive Langfuse tracing across all providers#1
Merged
Conversation
Add full OTel span instrumentation so setting LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY gives end-to-end trace visibility in Langfuse. - LLM spans: all 3 providers (OpenAI-compat, Anthropic, Gemini) emit gen_ai spans with token usage attrs (input/output/total) for cost tracking - Turn hierarchy: per-turn root span in client.ts with context propagation so LLM/tool/agent spans nest correctly in Langfuse trace view - Tool spans: coreToolScheduler wraps every tool execution in a child span with name, type, decision, duration, and error attributes - Agent spans: agent.ts wraps subagent execution (foreground + background) with full lifecycle coverage - Content logging: prompt/response span events gated by telemetryLogPrompts - Tests: 26 new tests covering Langfuse activation and turn span lifecycle - Docs: README Observability section + AGENTS.md config note Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mabry1985
pushed a commit
that referenced
this pull request
Apr 7, 2026
…esis Based on analysis of 35+ sources including JetBrains, Anthropic, SWE-bench leaderboard, and Wink's 42k-trajectory production study. ## Changes **Doom-loop fingerprinting (loopDetectionService + agent-core)** Replace consecutive-only threshold=5 with a sliding 20-call window where any fingerprint appearing 3+ times = doom loop. Catches non-consecutive repetition patterns that the old approach missed (the #1 recovery category in production, 39% of all interventions per Wink). **Silent sensors (baselineCheck + postEditVerify)** Remove PASSED output from baseline verify — silent on pass preserves context budget. Structured remediation steps on failure: read error → fix root cause → re-run command. **Read-only plan subagent (builtin-agents)** Add `plan` builtin agent with write tools structurally absent from its schema. Prevents the "accidental edit during planning" failure mode that every successful harness independently converged on. **Checkpoint commits (agentCore + agent-core)** Add `gitSnapshotBeforeEdit()` — creates a named shadow-repo commit before every file-mutating tool call in the AgentCore path. Durable across crashes, fire-and-forget so it never blocks tool execution. Pairs with existing in-memory CheckpointStore for dual-layer rollback. **Scope lock (scopeLock service + agent-core)** New `ScopeLockService` singleton — activated from a sprint contract with a permitted file set. Any write outside the set is intercepted before the tool executes, returning a structured violation message and blocking the edit. Addresses the 6.62% "unrequested changes" failure category. **Observation masking (chatCompressionService + agent-core)** Add `applyObservationMask()` — replaces old tool call/result pairs with a placeholder, keeping the last N verbatim. Applied before LLM compression in the AgentCore compaction path. JetBrains (2025): observation masking reduces peak tokens 26-54% while LLM summarisation made agents run 15% longer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6 tasks
mabry1985
added a commit
that referenced
this pull request
May 1, 2026
… example shape to description (#177) Reported failure mode: smaller / older models (Qwen3, Minimax variants, some open-weights routes) JSON-encode the entire `questions` array as a string instead of emitting it as a native array literal. validateToolParams threw with "Parameter \"questions\" must be an array" — useless feedback since the model HAD sent the array, just stringified. Three changes, layered: 1. Silent coercion in validateToolParams. If `questions` is a string that parses as JSON, parse it and continue. Logs a debugLogger.warn so the signal stays visible — silent coercion would mask a real upstream regression if model behavior shifts. Catches ~all of the user-reported failures without a retry round-trip. 2. Example shape added to the tool description. Models replicate concrete examples better than they synthesize from abstract schemas with 3 levels of nesting. Placeholder text is clearly labeled as shape-only so models don't cargo-cult the example values into their actual questions. 3. Sharper error message for the residual case (non-JSON garbage in the string slot): "Pass `questions` as a real array literal, not a JSON-encoded string." Clear, specific, tells the model exactly which strategy to drop. Considered but rejected: - Schema relaxation (allow `options: string[]`, default multiSelect): API change, breaks downstream `option.description` consumers (dialog UI, ACP renderer), premature without data showing #1+#2 are insufficient. Tests updated with two new cases: stringified-array coercion, non-JSON string error path. 5319 core / 0 fail; lint + typecheck clean. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gen_ai chat {model}spans withgen_ai.usage.input_tokens,gen_ai.usage.output_tokens, andgen_ai.usage.total_tokensfor Langfuse cost dashboardsturn) inclient.tsusing OTel context propagation; all child spans (LLM, tool, agent) nest under it in Langfuse's trace viewcoreToolSchedulerand every subagent inagent.ts(foreground + background) wrapped in child spanstelemetryLogPrompts(default on), truncated at 10k charsSet
LANGFUSE_PUBLIC_KEY+LANGFUSE_SECRET_KEYand every session is fully traced. No other config needed.Test plan
proto, verify session → turn → LLM/tool spans appear in Langfusetelemetry.enabled: falsestill suppresses OTLP pipeline but Langfuse traces still flow🤖 Generated with Claude Code