-
Notifications
You must be signed in to change notification settings - Fork 1
[FEATURE] Expose Token Usage and Execution Metrics in A2A Responses #126
Copy link
Copy link
Labels
Description
Summary
Add token usage and execution metrics to A2A task responses via the metadata field, enabling clients to track resource consumption, costs, and performance characteristics.
Related Spike: #124
Motivation
Currently, the ADK tracks token usage internally through OpenTelemetry metrics, but this valuable information is not accessible to A2A clients. Clients need this data for:
- Cost tracking: Understanding and budgeting for API costs
- Performance monitoring: Identifying inefficient interactions
- Analytics: Building dashboards and reports
- Debugging: Understanding resource consumption patterns
- Rate limiting: Managing usage against quotas
Proposed Solution
Phase 1: Basic Metadata Implementation
Populate the existing Task.Metadata field with usage statistics:
task.Metadata = map[string]any{
"usage": map[string]any{
"prompt_tokens": 1234,
"completion_tokens": 567,
"total_tokens": 1801,
},
"execution_stats": map[string]any{
"iterations": 3,
"messages": 7,
"tool_calls": 5,
"failed_tools": 1,
},
}Implementation Details
- Create usage aggregator to track metrics per task execution
- Modify agent execution to collect and aggregate:
- Token counts from LLM responses
- Iteration counts (agent loops)
- Tool call statistics
- Message counts
- Populate metadata before returning task
- Support streaming by including metadata in final
TaskUpdateEvent - Add configuration to enable/disable feature
Metrics to Track
Token Usage (when LLM is used):
prompt_tokens: Tokens sent to LLMcompletion_tokens: Tokens generated by LLMtotal_tokens: Sum of prompt + completion
Execution Statistics:
iterations: Number of agent execution loopsmessages: Total messages in conversationtool_calls: Total tool invocationsfailed_tools: Number of failed tool calls
Optional (verbose mode):
processing_time_ms: Total processing timellm_time_ms: Time spent in LLM callstool_execution_time_ms: Time spent in tool execution
Configuration
Add environment variables:
# Enable usage metadata in task responses
ENABLE_USAGE_METADATA=true
# Include detailed timing information
USAGE_METADATA_VERBOSE=falseAlternative Approaches Considered
- Protocol Extension: Define a formal A2A extension (more complex, better long-term)
- Schema Change: Modify A2A protocol to add
usagefield (requires governance approval)
See the spike research in issue #124 for detailed analysis.
Acceptance Criteria
- Task metadata includes token usage when LLM is used
- Execution statistics (iterations, tool calls) are tracked
- Configuration option to enable/disable feature
- Metadata structure is documented
- Unit tests verify metadata population
- Integration tests verify metadata in responses
- Examples updated to demonstrate usage
- Both streaming and non-streaming responses include metadata
- Backward compatibility is not important - feel free to break if needed - all in favor of a better design
Implementation Plan
Step 1: Core Infrastructure
- Create
UsageTrackertype to aggregate metrics - Add tracker to agent execution context
- Implement token usage collection from LLM responses
Step 2: Execution Statistics
- Track iteration count in agent loops
- Track tool call statistics (total, failed)
- Collect message counts
Step 3: Metadata Population
- Populate
task.Metadatabefore return (background tasks) - Include metadata in final streaming event
- Apply configuration settings
Step 4: Testing & Documentation
- Unit tests for
UsageTracker - Integration tests for metadata in responses
- Update examples (ai-powered, streaming)
- Document metadata structure in README
- Add troubleshooting guide
Step 5: Configuration
- Add environment variables
- Update config struct
- Add config validation
- Document configuration options
Example Usage
Client Request:
{
"jsonrpc": "2.0",
"method": "message/send",
"params": {
"message": {
"role": "user",
"parts": [{"kind": "text", "text": "What's the weather in Paris?"}]
}
},
"id": "1"
}Server Response (with metadata):
{
"jsonrpc": "2.0",
"result": {
"task": {
"id": "task-123",
"status": {
"state": "completed",
"message": {...}
},
"metadata": {
"usage": {
"prompt_tokens": 156,
"completion_tokens": 89,
"total_tokens": 245
},
"execution_stats": {
"iterations": 2,
"messages": 4,
"tool_calls": 1,
"failed_tools": 0
}
}
}
},
"id": "1"
}Open Questions
- Should cached responses show 0 tokens or the original token count?
- How should multi-agent delegation scenarios aggregate usage?
- Should partial usage be reported during streaming or only at the end?
- Is there a performance impact concern for tracking these metrics?
Out of Scope
- Billing/payment integration (separate concern)
- Cost calculations (client responsibility)
- Usage limits enforcement (separate feature)
- Historical usage analytics (client-side concern)
Reactions are currently unavailable