feat(telemetry): comprehensive Langfuse tracing across all providers by mabry1985 · Pull Request #1 · protoLabsAI/protoCLI

mabry1985 · 2026-04-02T05:57:26Z

Summary

LLM spans on all 3 providers — OpenAI-compat, Anthropic, Gemini each emit gen_ai chat {model} spans with gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.usage.total_tokens for Langfuse cost dashboards
Turn hierarchy — per-turn root span (turn) in client.ts using OTel context propagation; all child spans (LLM, tool, agent) nest under it in Langfuse's trace view
Tool + subagent spans — every tool execution in coreToolScheduler and every subagent in agent.ts (foreground + background) wrapped in child spans
Content logging — prompt/response span events gated by telemetryLogPrompts (default on), truncated at 10k chars
26 new tests — Langfuse activation, turn span lifecycle, default URL, auth header encoding
Docs — README Observability section + AGENTS.md config note

Set LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY and every session is fully traced. No other config needed.

Test plan

CI passes
Set Langfuse env vars locally, run proto, verify session → turn → LLM/tool spans appear in Langfuse
Verify telemetry.enabled: false still suppresses OTLP pipeline but Langfuse traces still flow

🤖 Generated with Claude Code

Add full OTel span instrumentation so setting LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY gives end-to-end trace visibility in Langfuse. - LLM spans: all 3 providers (OpenAI-compat, Anthropic, Gemini) emit gen_ai spans with token usage attrs (input/output/total) for cost tracking - Turn hierarchy: per-turn root span in client.ts with context propagation so LLM/tool/agent spans nest correctly in Langfuse trace view - Tool spans: coreToolScheduler wraps every tool execution in a child span with name, type, decision, duration, and error attributes - Agent spans: agent.ts wraps subagent execution (foreground + background) with full lifecycle coverage - Content logging: prompt/response span events gated by telemetryLogPrompts - Tests: 26 new tests covering Langfuse activation and turn span lifecycle - Docs: README Observability section + AGENTS.md config note Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…esis Based on analysis of 35+ sources including JetBrains, Anthropic, SWE-bench leaderboard, and Wink's 42k-trajectory production study. ## Changes **Doom-loop fingerprinting (loopDetectionService + agent-core)** Replace consecutive-only threshold=5 with a sliding 20-call window where any fingerprint appearing 3+ times = doom loop. Catches non-consecutive repetition patterns that the old approach missed (the #1 recovery category in production, 39% of all interventions per Wink). **Silent sensors (baselineCheck + postEditVerify)** Remove PASSED output from baseline verify — silent on pass preserves context budget. Structured remediation steps on failure: read error → fix root cause → re-run command. **Read-only plan subagent (builtin-agents)** Add `plan` builtin agent with write tools structurally absent from its schema. Prevents the "accidental edit during planning" failure mode that every successful harness independently converged on. **Checkpoint commits (agentCore + agent-core)** Add `gitSnapshotBeforeEdit()` — creates a named shadow-repo commit before every file-mutating tool call in the AgentCore path. Durable across crashes, fire-and-forget so it never blocks tool execution. Pairs with existing in-memory CheckpointStore for dual-layer rollback. **Scope lock (scopeLock service + agent-core)** New `ScopeLockService` singleton — activated from a sprint contract with a permitted file set. Any write outside the set is intercepted before the tool executes, returning a structured violation message and blocking the edit. Addresses the 6.62% "unrequested changes" failure category. **Observation masking (chatCompressionService + agent-core)** Add `applyObservationMask()` — replaces old tool call/result pairs with a placeholder, keeping the last N verbatim. Applied before LLM compression in the AgentCore compaction path. JetBrains (2025): observation masking reduces peak tokens 26-54% while LLM summarisation made agents run 15% longer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… example shape to description (#177) Reported failure mode: smaller / older models (Qwen3, Minimax variants, some open-weights routes) JSON-encode the entire `questions` array as a string instead of emitting it as a native array literal. validateToolParams threw with "Parameter \"questions\" must be an array" — useless feedback since the model HAD sent the array, just stringified. Three changes, layered: 1. Silent coercion in validateToolParams. If `questions` is a string that parses as JSON, parse it and continue. Logs a debugLogger.warn so the signal stays visible — silent coercion would mask a real upstream regression if model behavior shifts. Catches ~all of the user-reported failures without a retry round-trip. 2. Example shape added to the tool description. Models replicate concrete examples better than they synthesize from abstract schemas with 3 levels of nesting. Placeholder text is clearly labeled as shape-only so models don't cargo-cult the example values into their actual questions. 3. Sharper error message for the residual case (non-JSON garbage in the string slot): "Pass `questions` as a real array literal, not a JSON-encoded string." Clear, specific, tells the model exactly which strategy to drop. Considered but rejected: - Schema relaxation (allow `options: string[]`, default multiSelect): API change, breaks downstream `option.description` consumers (dialog UI, ACP renderer), premature without data showing #1+#2 are insufficient. Tests updated with two new cases: stringified-array coercion, non-JSON string error path. 5319 core / 0 fail; lint + typecheck clean. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mabry1985 merged commit 6a61a19 into main Apr 2, 2026
2 of 3 checks passed

mabry1985 mentioned this pull request May 1, 2026

fix(ask_user_question): recover from stringified questions array, add example shape #177

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telemetry): comprehensive Langfuse tracing across all providers#1

feat(telemetry): comprehensive Langfuse tracing across all providers#1
mabry1985 merged 1 commit into
mainfrom
feat/langfuse-tracing

mabry1985 commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mabry1985 commented Apr 2, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant