Prerequisites
What are you trying to do that currently feels hard or impossible?
Critical gaps in OpenTelemetry instrumentation prevent production observability:
1. Debug STDIO client connections (Claude Desktop, MCP Inspector)
- STDIO transport has 0% instrumentation coverage
- Cannot trace connection lifecycle or message flow
- Cannot measure performance or see errors
- Primary desktop client transport is a complete blind spot
2. Measure MCP protocol performance
- Cannot distinguish between MCP methods (
initialize, tools/list, tools/call, ping)
- MCP tool invocations don't record metrics while Native API does (
api.go:154)
- Majority of production usage is invisible in monitoring dashboards
3. Analyze tool execution performance
- No per-tool tracing for all the third-party tools
- Cannot measure individual tool execution time
- Cannot identify slow performance blocker
- Cannot trace external API calls from tools
4. Track authentication failures
- No tracing for token validation, claims extraction, or authorization checks
- Cannot measure auth overhead or debug auth-related failures
Current Coverage:
| Transport |
Traces |
Metrics |
Coverage |
| Native |
✅ |
✅ |
100% |
| HTTP/SSE |
⚠️ Partial |
⚠️ Partial |
~60% |
| STDIO |
❌ None |
❌ None |
0% |
Impact: Cannot effectively monitor, debug, or optimize production deployments. Server-side metrics are essential for SLOs/SLIs but currently missing for MCP flows.
Suggested Solution(s)
Implement comprehensive OpenTelemetry instrumentation across Toolbox:
STDIO Transport Instrumentation
- Add connection lifecycle spans in
ServeStdio()
- Add message read/write tracing in
readInputStream()
- Add STDIO-specific metrics (connections, messages, size)
MCP Method-Level Tracing
- Pass instrumentation object to MCP method handlers
- Add spans for each method:
initialize, tools/list, tools/call, ping
- Critical fix: Add
instrumentation.ToolInvoke.Add() in toolsCallHandler() (currently missing)
- Apply consistently to all 3 MCP protocol versions (v20241105, v20250326, v20250618)
Authentication Tracing
- Add spans for token validation, claims extraction, authorization checks
- Add auth success/failure metrics
Tool Interface Changes
- Add
tracer field to Tool interface
- Instrument third-party tools
- Performance validation
- Enhanced error recording with structured attributes
- Performance metrics (latency histograms, size metrics)
- Documentation, testing utilities, and monitoring dashboards
Success Criteria:
- STDIO transport: 0% → 100% instrumented
- MCP metrics achieve parity with Native API
- All MCP methods create distinct, traceable spans
- All third-party tools instrumented with execution tracing
- Performance overhead < 5%
Alternatives Considered
Client-side instrumentation only
- Why rejected: Cannot measure operations from non-instrumented clients; server-side metrics required for SLOs/SLIs
Sampling-only approach
- Why rejected: Metrics require 100% coverage for accuracy; sampling useful for traces but spans must exist first
Manual logging instead of OpenTelemetry
- Why rejected: Already using OpenTelemetry SDK; need distributed tracing with correlation, not just logs; industry-standard exporters
Current workarounds:
- Manual log analysis (time-consuming, incomplete)
- Client-side metrics only (misses server-side operations)
- HTTP API testing to infer MCP behavior (unreliable)
Additional Details
Root Cause Analysis:
- Native API handlers have direct access to
s.instrumentation (working correctly)
- MCP method handlers are at protocol layer without instrumentation access
- STDIO session lifecycle not traced:
ServeStdio() at server.go:372 creates no span
Code Locations:
- Telemetry:
internal/telemetry/instrumentation.go
- Native API:
internal/server/api.go:154 (metrics working)
- MCP Handlers:
internal/server/mcp/v*/method.go:163 (metrics missing)
- STDIO:
internal/server/server.go:372 (no tracing)
Prerequisites
What are you trying to do that currently feels hard or impossible?
Critical gaps in OpenTelemetry instrumentation prevent production observability:
1. Debug STDIO client connections (Claude Desktop, MCP Inspector)
2. Measure MCP protocol performance
initialize,tools/list,tools/call,ping)api.go:154)3. Analyze tool execution performance
4. Track authentication failures
Current Coverage:
Impact: Cannot effectively monitor, debug, or optimize production deployments. Server-side metrics are essential for SLOs/SLIs but currently missing for MCP flows.
Suggested Solution(s)
Implement comprehensive OpenTelemetry instrumentation across Toolbox:
STDIO Transport Instrumentation
ServeStdio()readInputStream()MCP Method-Level Tracing
initialize,tools/list,tools/call,pinginstrumentation.ToolInvoke.Add()intoolsCallHandler()(currently missing)Authentication Tracing
Tool Interface Changes
tracerfield to Tool interfaceSuccess Criteria:
Alternatives Considered
Client-side instrumentation only
Sampling-only approach
Manual logging instead of OpenTelemetry
Current workarounds:
Additional Details
Root Cause Analysis:
s.instrumentation(working correctly)ServeStdio()atserver.go:372creates no spanCode Locations:
internal/telemetry/instrumentation.gointernal/server/api.go:154(metrics working)internal/server/mcp/v*/method.go:163(metrics missing)internal/server/server.go:372(no tracing)