Skip to content

feat: daily token usage analysis workflow #1551

@lpcox

Description

@lpcox

Summary

Create a daily agentic workflow that mines token usage data from recent workflow runs, identifies trends and inefficiencies, and creates a summary issue with findings and optimization recommendations.

Background

PR #1539 introduced token usage tracking in the api-proxy sidecar. Each workflow run with --enable-api-proxy now produces a token-usage.jsonl artifact containing per-request records with:

{"timestamp":"...","request_id":"...","provider":"anthropic","model":"claude-sonnet-4-6","path":"/v1/messages?beta=true","status":200,"streaming":true,"input_tokens":3,"output_tokens":418,"cache_read_tokens":14044,"cache_write_tokens":26042,"duration_ms":5858,"response_bytes":2800}

This data is captured in the agent-artifacts upload (under sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl) but is not yet analyzed systematically.

Proposed Workflow

Trigger

  • Daily schedule (e.g., 08:00 UTC)
  • workflow_dispatch for manual runs

Behavior

  1. Discover runs since last analysis

    • Use gh run list to find completed agentic workflow runs since the last analysis issue
    • Cover all smoke tests, secret-digger, and other workflows that use --enable-api-proxy
  2. Download and aggregate token-usage.jsonl

    • For each run, download the agent-artifacts artifact
    • Extract token-usage.jsonl from the artifact
    • Gracefully handle runs where token logs are not available (this is a new feature — older runs and runs without --enable-api-proxy won't have logs)
  3. Compute per-workflow statistics

    • Total tokens (input + output + cache read + cache write)
    • Input/output ratio
    • Cache hit rate (cache_read / (cache_read + input))
    • Average request count per run
    • Average duration per request
    • Model mix (which models are used, in what proportions)
  4. Identify trends

    • Compare current period vs previous period (if historical data available)
    • Flag workflows with increasing token consumption
    • Flag workflows with low cache hit rates (opportunity for prompt caching)
    • Flag workflows with high input/output ratios (may indicate bloated prompts or tool schemas)
  5. Generate optimization recommendations

    • Workflows with zero cache hits → suggest enabling prompt caching
    • Workflows with >100:1 input/output ratio → suggest reducing system prompt or tool surface
    • Workflows with many small requests → suggest batching
    • Workflows with unusually high total tokens → flag for review
  6. Create a summary issue

    • Title: Token Usage Report: YYYY-MM-DD
    • Body includes:
      • Overview table (workflow name, total tokens, cost estimate, cache rate)
      • Per-workflow detail sections (collapsible)
      • Historical trend chart (if prior reports exist)
      • Top optimization opportunities
      • Link to previous report issue (if any)
    • Label: token-usage-report

Graceful Degradation

Since token tracking is new:

  • Skip runs that have no agent-artifacts artifact
  • Skip runs where token-usage.jsonl is missing or empty
  • Report which workflows are not yet instrumented
  • Don't fail the workflow if no data is available — create a minimal report noting the gap

Tools Needed

  • github MCP server — for listing runs, downloading artifacts, creating issues
  • bash — for processing JSONL files, computing aggregates
  • web-fetch — not needed (all data is in GitHub artifacts)

Permissions

  • actions: read — to list workflow runs and download artifacts
  • issues: write — to create summary issues

Token Usage Record Schema

Field Type Description
timestamp string ISO 8601 timestamp
request_id string Unique request ID
provider string anthropic, openai, copilot, opencode
model string Model name (e.g., claude-sonnet-4-6)
path string API endpoint path
status number HTTP status code
streaming boolean Whether response was SSE streamed
input_tokens number Input/prompt tokens
output_tokens number Output/completion tokens
cache_read_tokens number Prompt cache read tokens
cache_write_tokens number Prompt cache write tokens
duration_ms number Request duration in milliseconds
response_bytes number Response body size

Example Analysis (from smoke-claude run)

From a recent smoke-claude run (6 requests):

  • Total tokens: 197,558 (775 input, 1,227 output, 152K cache read, 43K cache write)
  • Cache hit rate: 99.5%
  • Models: claude-haiku-4.5 (routing), claude-sonnet-4.6 (main agent)
  • Estimated cost: ~$0.23
  • Key finding: Anthropic prompt caching is working well (63% cost savings)

Related

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions