Skip to content

Enhance Telemetry for Toolbox Servers #2222

@AjmeraParth132

Description

@AjmeraParth132

Prerequisites

What are you trying to do that currently feels hard or impossible?

Critical gaps in OpenTelemetry instrumentation prevent production observability:

  • STDIO transport blind spot - Zero instrumentation for STDIO connections; cannot trace lifecycle, messages, or errors in the primary transport
  • MCP HTTP protocol invisibility - Cannot distinguish between MCP methods (initialize, tools/list, tools/call); tool invocations lack metrics while Native API has them, hiding majority of production usage
  • Tool execution black box - No per-tool detailed tracing for third-party tools; cannot measure execution time, identify performance bottlenecks, track message lengths and in depth metrics.

Current Coverage:

Transport Traces Metrics Coverage
Native 100%
HTTP/SSE ⚠️ Partial ⚠️ Partial ~60%
STDIO ❌ None ❌ None 0%

Impact: Cannot effectively monitor, debug, or optimize production deployments. Server-side metrics are essential for SLOs/SLIs but currently missing for MCP flows.

Suggested Solution(s)

Phase of Actions -

  • Fix up current telemetry - To add consistent and actionable telemetry for MCP sessions across HTTP and STDIO transports, enabling quick visibility into toolset discovery and tool invocation activity with minimal setup.
  • Introduce and add detailed Metrics - ideate and implement more business critical metrics to the telemetry, for example message size, request duration etc
  • Set up v1 live for toolbox users - Is mostly some fine tuning at Agnost’s architecture, can add the one liner setup for toolbox users in the documentation
  • Implement end to end telemetry across the ADK pipeline.
  • See if we need a standardized semantics, and set up an SEP for what should be the industry standards?

Alternatives Considered

No response

Additional Details

This issue is to set and enhance Telemetry for Toolbox servers, from fixing the current misalignments to add more useful metrics and global standards going forward to make Toolbox Servers end to end observable on all providers like Grafana, Datadogs and Agnost

Metadata

Metadata

Assignees

Labels

priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions