-
Notifications
You must be signed in to change notification settings - Fork 18
feat: daily token usage analysis workflow #1551
Description
Summary
Create a daily agentic workflow that mines token usage data from recent workflow runs, identifies trends and inefficiencies, and creates a summary issue with findings and optimization recommendations.
Background
PR #1539 introduced token usage tracking in the api-proxy sidecar. Each workflow run with --enable-api-proxy now produces a token-usage.jsonl artifact containing per-request records with:
{"timestamp":"...","request_id":"...","provider":"anthropic","model":"claude-sonnet-4-6","path":"/v1/messages?beta=true","status":200,"streaming":true,"input_tokens":3,"output_tokens":418,"cache_read_tokens":14044,"cache_write_tokens":26042,"duration_ms":5858,"response_bytes":2800}This data is captured in the agent-artifacts upload (under sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl) but is not yet analyzed systematically.
Proposed Workflow
Trigger
- Daily schedule (e.g., 08:00 UTC)
workflow_dispatchfor manual runs
Behavior
-
Discover runs since last analysis
- Use
gh run listto find completed agentic workflow runs since the last analysis issue - Cover all smoke tests, secret-digger, and other workflows that use
--enable-api-proxy
- Use
-
Download and aggregate token-usage.jsonl
- For each run, download the
agent-artifactsartifact - Extract
token-usage.jsonlfrom the artifact - Gracefully handle runs where token logs are not available (this is a new feature — older runs and runs without
--enable-api-proxywon't have logs)
- For each run, download the
-
Compute per-workflow statistics
- Total tokens (input + output + cache read + cache write)
- Input/output ratio
- Cache hit rate (
cache_read / (cache_read + input)) - Average request count per run
- Average duration per request
- Model mix (which models are used, in what proportions)
-
Identify trends
- Compare current period vs previous period (if historical data available)
- Flag workflows with increasing token consumption
- Flag workflows with low cache hit rates (opportunity for prompt caching)
- Flag workflows with high input/output ratios (may indicate bloated prompts or tool schemas)
-
Generate optimization recommendations
- Workflows with zero cache hits → suggest enabling prompt caching
- Workflows with >100:1 input/output ratio → suggest reducing system prompt or tool surface
- Workflows with many small requests → suggest batching
- Workflows with unusually high total tokens → flag for review
-
Create a summary issue
- Title:
Token Usage Report: YYYY-MM-DD - Body includes:
- Overview table (workflow name, total tokens, cost estimate, cache rate)
- Per-workflow detail sections (collapsible)
- Historical trend chart (if prior reports exist)
- Top optimization opportunities
- Link to previous report issue (if any)
- Label:
token-usage-report
- Title:
Graceful Degradation
Since token tracking is new:
- Skip runs that have no
agent-artifactsartifact - Skip runs where
token-usage.jsonlis missing or empty - Report which workflows are not yet instrumented
- Don't fail the workflow if no data is available — create a minimal report noting the gap
Tools Needed
githubMCP server — for listing runs, downloading artifacts, creating issuesbash— for processing JSONL files, computing aggregatesweb-fetch— not needed (all data is in GitHub artifacts)
Permissions
actions: read— to list workflow runs and download artifactsissues: write— to create summary issues
Token Usage Record Schema
| Field | Type | Description |
|---|---|---|
timestamp |
string | ISO 8601 timestamp |
request_id |
string | Unique request ID |
provider |
string | anthropic, openai, copilot, opencode |
model |
string | Model name (e.g., claude-sonnet-4-6) |
path |
string | API endpoint path |
status |
number | HTTP status code |
streaming |
boolean | Whether response was SSE streamed |
input_tokens |
number | Input/prompt tokens |
output_tokens |
number | Output/completion tokens |
cache_read_tokens |
number | Prompt cache read tokens |
cache_write_tokens |
number | Prompt cache write tokens |
duration_ms |
number | Request duration in milliseconds |
response_bytes |
number | Response body size |
Example Analysis (from smoke-claude run)
From a recent smoke-claude run (6 requests):
- Total tokens: 197,558 (775 input, 1,227 output, 152K cache read, 43K cache write)
- Cache hit rate: 99.5%
- Models: claude-haiku-4.5 (routing), claude-sonnet-4.6 (main agent)
- Estimated cost: ~$0.23
- Key finding: Anthropic prompt caching is working well (63% cost savings)
Related
- PR feat: add token usage tracking to api-proxy sidecar #1539 — Token tracking infrastructure
- PR feat: include api-proxy token logs in firewall audit artifact #1549 — Include token logs in firewall audit artifact
- PR fix: decompress gzip responses for Anthropic token extraction #1550 — Fix gzip decompression for Anthropic token extraction