Daily Claude Token Usage Analyzer #6
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| --- | |
| description: Daily Claude token usage analysis across agentic workflow runs — identifies trends, inefficiencies, and optimization opportunities | |
| on: | |
| schedule: daily | |
| workflow_dispatch: | |
| permissions: | |
| contents: read | |
| actions: read | |
| issues: read | |
| pull-requests: read | |
| imports: | |
| - shared/mcp-pagination.md | |
| - shared/reporting.md | |
| network: | |
| allowed: | |
| - github | |
| - "*.blob.core.windows.net" | |
| tools: | |
| github: | |
| toolsets: [default, actions] | |
| bash: true | |
| safe-outputs: | |
| create-issue: | |
| title-prefix: "📊 Claude Token Usage Report" | |
| labels: [claude-token-usage-report] | |
| close-older-issues: true | |
| timeout-minutes: 15 | |
| --- | |
| # Daily Claude Token Usage Analyzer | |
| You are an AI agent that analyzes Claude token usage across agentic workflow runs in this repository. Your goal is to identify trends, highlight inefficiencies, and recommend optimizations to reduce AI inference costs for Claude-engine workflows. | |
| ## Background | |
| This repository uses the **Agent Workflow Firewall (AWF)** with an api-proxy sidecar that tracks token usage for LLM API calls. Each workflow run with `--enable-api-proxy` produces a `token-usage.jsonl` file captured in the `agent-artifacts` upload artifact. | |
| **Token usage tracking is a new feature** — many older runs won't have this data. Handle missing data gracefully. | |
| ### Token Usage Record Format | |
| Each line in `token-usage.jsonl` is a JSON object: | |
| ```json | |
| { | |
| "timestamp": "2026-04-01T17:38:12.486Z", | |
| "request_id": "uuid", | |
| "provider": "anthropic", | |
| "model": "claude-sonnet-4-6", | |
| "path": "/v1/messages?beta=true", | |
| "status": 200, | |
| "streaming": true, | |
| "input_tokens": 3, | |
| "output_tokens": 418, | |
| "cache_read_tokens": 14044, | |
| "cache_write_tokens": 26042, | |
| "duration_ms": 5858, | |
| "response_bytes": 2800 | |
| } | |
| ``` | |
| ## Your Mission | |
| ### Step 1: Discover Recent Workflow Runs | |
| Use `gh run list` via bash to find completed agentic workflow runs from the past 24 hours (or since the last token usage report issue). Focus on **Claude-engine workflows** that use the api-proxy: | |
| - `smoke-claude` | |
| - `secret-digger-claude` | |
| - `security-review`, `security-guard` | |
| - Any other Claude-engine workflow with `agent-artifacts` | |
| **Note:** Copilot-engine and Codex-engine workflows (e.g., `smoke-copilot`, `smoke-chroot`, `smoke-services`, `build-test`, `smoke-codex`, `secret-digger-copilot`) are excluded from this analysis — they are covered by the separate Copilot Token Usage Analyzer. | |
| Use bash to run: | |
| ```bash | |
| # Find runs from the last 24 hours | |
| CUTOFF="$(date -u -Iseconds -d '24 hours ago')" | |
| gh run list --repo "$GITHUB_REPOSITORY" --limit 50 \ | |
| --created ">=$CUTOFF" \ | |
| --json databaseId,name,status,conclusion,createdAt,workflowName \ | |
| --jq '[.[] | select(.conclusion == "success" or .conclusion == "failure")]' | |
| ``` | |
| ### Step 2: Download and Parse Token Usage Data | |
| For each discovered run, attempt to download the `agent-artifacts` artifact and extract `token-usage.jsonl`. | |
| **IMPORTANT:** Always use `gh run download` via bash — this is much faster than using MCP `get_job_logs` and the network is configured to allow artifact blob storage access. | |
| ```bash | |
| # Create temp directory | |
| TMPDIR=$(mktemp -d) | |
| # Try to download artifacts for a run | |
| gh run download <RUN_ID> --repo "$GITHUB_REPOSITORY" --name agent-artifacts --dir "$TMPDIR/run-<RUN_ID>" 2>/dev/null | |
| # Look for token-usage.jsonl (may be nested under sandbox/firewall/logs/api-proxy-logs/) | |
| find "$TMPDIR/run-<RUN_ID>" -name "token-usage.jsonl" 2>/dev/null | |
| ``` | |
| **Filter for Claude/Anthropic requests:** When parsing `token-usage.jsonl`, only include records where `provider` is `"anthropic"`. This ensures we're analyzing Claude-specific usage even if a workflow makes calls to multiple providers. | |
| **Graceful degradation:** | |
| - If artifact download fails → skip run, note it as "no artifacts" | |
| - If `token-usage.jsonl` is missing → skip run, note it as "no token logs" | |
| - If the file is empty → skip run, note it as "empty token logs" | |
| - Track which workflows have instrumentation vs which don't | |
| ### Step 3: Compute Per-Workflow Statistics | |
| For each workflow that has token data, calculate: | |
| 1. **Total tokens**: `input_tokens + output_tokens + cache_read_tokens + cache_write_tokens` | |
| 2. **Full-price tokens**: `input_tokens + output_tokens + cache_write_tokens` (excludes discounted cache reads; use **Estimated cost** below for true billed amount) | |
| 3. **Input/output ratio**: `(input_tokens + cache_read_tokens) / output_tokens` (if `output_tokens == 0`, treat the ratio as `∞`/`N/A` and exclude that request from ratio averages to avoid division by zero) | |
| 4. **Cache hit rate**: `cache_read_tokens / (cache_read_tokens + input_tokens) * 100` | |
| 5. **Cache write rate**: `cache_write_tokens / (cache_read_tokens + input_tokens + cache_write_tokens) * 100` | |
| 6. **Request count**: Number of records in the JSONL | |
| 7. **Average latency**: Mean `duration_ms` per request | |
| 8. **Model distribution**: Count of requests per model | |
| 9. **Estimated cost** (sum all four token types at their respective rates — cache reads are discounted, not free): | |
| - Anthropic Sonnet: input $3/M, output $15/M, cache_read $0.30/M, cache_write $3.75/M | |
| - Anthropic Haiku: input $0.80/M, output $4/M, cache_read $0.08/M, cache_write $1/M | |
| - Anthropic Opus: input $15/M, output $75/M, cache_read $1.50/M, cache_write $18.75/M | |
| Use bash with a Python or jq script to process the JSONL files efficiently. | |
| ### Step 4: Identify Optimization Opportunities | |
| Flag workflows with these patterns: | |
| | Pattern | Threshold | Recommendation | | |
| |---------|-----------|----------------| | |
| | Zero cache hits | cache_hit_rate = 0% | Enable prompt caching | | |
| | Low cache hits | cache_hit_rate < 50% | Review cache breakpoints | | |
| | High cache write vs read | cache_write > cache_read | Cache is being written but not reused — check if conversation turns are too short | | |
| | High input/output ratio | ratio > 100:1 | Reduce system prompt or MCP tool surface | | |
| | Many small requests | >10 requests, <50 output tokens each | Batch requests or combine tool calls | | |
| | High total cost | >$1.00 per run | Review if workflow is doing too much | | |
| | Increasing trend | >20% increase vs last report | Investigate what changed | | |
| ### Step 5: Check for Historical Trends | |
| Search for previous token usage report issues: | |
| ```bash | |
| gh issue list --repo "$GITHUB_REPOSITORY" --label claude-token-usage-report --state all --limit 5 --json number,title,createdAt,url | |
| ``` | |
| If previous reports exist, compare current metrics to identify: | |
| - Workflows with increasing token consumption | |
| - Workflows that gained or lost prompt caching | |
| - New workflows that started using the api-proxy | |
| - Cost trend over time | |
| ### Step 6: Create the Summary Issue | |
| Create an issue with the following structure: | |
| #### Title: `YYYY-MM-DD` (safe-outputs will automatically prefix this with "📊 Claude Token Usage Report") | |
| #### Body structure: | |
| ```markdown | |
| ### Overview | |
| **Period**: [start date] to [end date] | |
| **Runs analyzed**: X of Y (Z had token data) | |
| **Total tokens**: N across all workflows | |
| **Estimated total cost**: $X.XX | |
| ### Workflow Summary | |
| | Workflow | Runs | Total Tokens | Cost | Cache Rate | I/O Ratio | Top Model | | |
| |----------|------|-------------|------|------------|-----------|-----------| | |
| | smoke-claude | 2 | 395K | $0.46 | 99.5% | 0.6:1 | sonnet-4.6 | | |
| | security-review | 1 | 120K | $0.22 | 85% | 2.3:1 | sonnet-4.6 | | |
| | ... | | | | | | | | |
| ### 🔍 Optimization Opportunities | |
| 1. **secret-digger-claude** — 45% cache hit rate, high cache writes | |
| - Cache is being created but not fully reused across turns | |
| - Consider restructuring the prompt to maximize cache prefix reuse | |
| 2. ... | |
| <details> | |
| <summary><b>Per-Workflow Details</b></summary> | |
| #### smoke-claude | |
| - **Runs**: 2 (run 123, run 456) | |
| - **Requests**: 12 total (avg 6/run) | |
| - **Models**: claude-haiku-4.5 (4 reqs), claude-sonnet-4.6 (8 reqs) | |
| - **Tokens**: 395K total (1.5K input, 2.5K output, 304K cache_read, 87K cache_write) | |
| - **Cache hit rate**: 99.5% | |
| - **Cache write rate**: 22.3% | |
| - **Avg latency**: 3,800ms/request | |
| - **Estimated cost**: $0.46 | |
| #### security-review | |
| ... | |
| </details> | |
| <details> | |
| <summary><b>Workflows Without Token Data</b></summary> | |
| The following workflows either don't use `--enable-api-proxy` or ran before token tracking was enabled: | |
| - secret-digger-claude (1 run — no agent-artifacts) | |
| </details> | |
| ### Historical Trend | |
| [If previous reports exist, show comparison. Otherwise note: "This is the first Claude token usage report. Historical trends will be available in future reports."] | |
| ### Previous Report | |
| [Link to previous report issue if one exists, otherwise omit this section] | |
| ``` | |
| ## Important Guidelines | |
| - **Time budget** — You have 15 minutes total. Spend at most 8 minutes on data collection (steps 1-2) and reserve the rest for analysis and issue creation. If artifact downloads are slow, limit to the 5 most recent runs. | |
| - **Prefer bash over MCP** for data collection — `gh run download` via bash is much faster than MCP `get_job_logs` for retrieving artifacts. | |
| - **Do NOT fail** if no token data is available. Create a minimal report explaining that token tracking is new and which workflows need instrumentation. | |
| - **Clean up** temporary directories after processing. | |
| - **Respect rate limits** — download artifacts one at a time, not in parallel. | |
| - **Use `--perPage` parameters** when listing runs to avoid token limits on MCP responses. | |
| - **Wrap verbose output** in `<details>` blocks for progressive disclosure. | |
| - **Round costs** to 2 decimal places, token counts to nearest thousand for readability. | |
| - **Sort workflows** by estimated cost (highest first) in the summary table. |