Skip to content

[FEATURE] Expose Token Usage and Execution Metrics in A2A Responses #126

@claude

Description

@claude

Summary

Add token usage and execution metrics to A2A task responses via the metadata field, enabling clients to track resource consumption, costs, and performance characteristics.

Related Spike: #124

Motivation

Currently, the ADK tracks token usage internally through OpenTelemetry metrics, but this valuable information is not accessible to A2A clients. Clients need this data for:

  • Cost tracking: Understanding and budgeting for API costs
  • Performance monitoring: Identifying inefficient interactions
  • Analytics: Building dashboards and reports
  • Debugging: Understanding resource consumption patterns
  • Rate limiting: Managing usage against quotas

Proposed Solution

Phase 1: Basic Metadata Implementation

Populate the existing Task.Metadata field with usage statistics:

task.Metadata = map[string]any{
    "usage": map[string]any{
        "prompt_tokens":     1234,
        "completion_tokens": 567,
        "total_tokens":      1801,
    },
    "execution_stats": map[string]any{
        "iterations":     3,
        "messages":       7,
        "tool_calls":     5,
        "failed_tools":   1,
    },
}

Implementation Details

  1. Create usage aggregator to track metrics per task execution
  2. Modify agent execution to collect and aggregate:
    • Token counts from LLM responses
    • Iteration counts (agent loops)
    • Tool call statistics
    • Message counts
  3. Populate metadata before returning task
  4. Support streaming by including metadata in final TaskUpdateEvent
  5. Add configuration to enable/disable feature

Metrics to Track

Token Usage (when LLM is used):

  • prompt_tokens: Tokens sent to LLM
  • completion_tokens: Tokens generated by LLM
  • total_tokens: Sum of prompt + completion

Execution Statistics:

  • iterations: Number of agent execution loops
  • messages: Total messages in conversation
  • tool_calls: Total tool invocations
  • failed_tools: Number of failed tool calls

Optional (verbose mode):

  • processing_time_ms: Total processing time
  • llm_time_ms: Time spent in LLM calls
  • tool_execution_time_ms: Time spent in tool execution

Configuration

Add environment variables:

# Enable usage metadata in task responses
ENABLE_USAGE_METADATA=true

# Include detailed timing information
USAGE_METADATA_VERBOSE=false

Alternative Approaches Considered

  1. Protocol Extension: Define a formal A2A extension (more complex, better long-term)
  2. Schema Change: Modify A2A protocol to add usage field (requires governance approval)

See the spike research in issue #124 for detailed analysis.

Acceptance Criteria

  • Task metadata includes token usage when LLM is used
  • Execution statistics (iterations, tool calls) are tracked
  • Configuration option to enable/disable feature
  • Metadata structure is documented
  • Unit tests verify metadata population
  • Integration tests verify metadata in responses
  • Examples updated to demonstrate usage
  • Both streaming and non-streaming responses include metadata
  • Backward compatibility is not important - feel free to break if needed - all in favor of a better design

Implementation Plan

Step 1: Core Infrastructure

  • Create UsageTracker type to aggregate metrics
  • Add tracker to agent execution context
  • Implement token usage collection from LLM responses

Step 2: Execution Statistics

  • Track iteration count in agent loops
  • Track tool call statistics (total, failed)
  • Collect message counts

Step 3: Metadata Population

  • Populate task.Metadata before return (background tasks)
  • Include metadata in final streaming event
  • Apply configuration settings

Step 4: Testing & Documentation

  • Unit tests for UsageTracker
  • Integration tests for metadata in responses
  • Update examples (ai-powered, streaming)
  • Document metadata structure in README
  • Add troubleshooting guide

Step 5: Configuration

  • Add environment variables
  • Update config struct
  • Add config validation
  • Document configuration options

Example Usage

Client Request:

{
  "jsonrpc": "2.0",
  "method": "message/send",
  "params": {
    "message": {
      "role": "user",
      "parts": [{"kind": "text", "text": "What's the weather in Paris?"}]
    }
  },
  "id": "1"
}

Server Response (with metadata):

{
  "jsonrpc": "2.0",
  "result": {
    "task": {
      "id": "task-123",
      "status": {
        "state": "completed",
        "message": {...}
      },
      "metadata": {
        "usage": {
          "prompt_tokens": 156,
          "completion_tokens": 89,
          "total_tokens": 245
        },
        "execution_stats": {
          "iterations": 2,
          "messages": 4,
          "tool_calls": 1,
          "failed_tools": 0
        }
      }
    }
  },
  "id": "1"
}

Open Questions

  1. Should cached responses show 0 tokens or the original token count?
  2. How should multi-agent delegation scenarios aggregate usage?
  3. Should partial usage be reported during streaming or only at the end?
  4. Is there a performance impact concern for tracking these metrics?

Out of Scope

  • Billing/payment integration (separate concern)
  • Cost calculations (client responsibility)
  • Usage limits enforcement (separate feature)
  • Historical usage analytics (client-side concern)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions