feat: daily token usage analysis workflow

## Summary

Create a daily agentic workflow that mines token usage data from recent workflow runs, identifies trends and inefficiencies, and creates a summary issue with findings and optimization recommendations.

## Background

PR #1539 introduced token usage tracking in the api-proxy sidecar. Each workflow run with `--enable-api-proxy` now produces a `token-usage.jsonl` artifact containing per-request records with:

```jsonl
{"timestamp":"...","request_id":"...","provider":"anthropic","model":"claude-sonnet-4-6","path":"/v1/messages?beta=true","status":200,"streaming":true,"input_tokens":3,"output_tokens":418,"cache_read_tokens":14044,"cache_write_tokens":26042,"duration_ms":5858,"response_bytes":2800}
```

This data is captured in the `agent-artifacts` upload (under `sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl`) but is not yet analyzed systematically.

## Proposed Workflow

### Trigger
- **Daily schedule** (e.g., 08:00 UTC)
- **`workflow_dispatch`** for manual runs

### Behavior

1. **Discover runs since last analysis**
   - Use `gh run list` to find completed agentic workflow runs since the last analysis issue
   - Cover all smoke tests, secret-digger, and other workflows that use `--enable-api-proxy`

2. **Download and aggregate token-usage.jsonl**
   - For each run, download the `agent-artifacts` artifact
   - Extract `token-usage.jsonl` from the artifact
   - Gracefully handle runs where token logs are not available (this is a new feature — older runs and runs without `--enable-api-proxy` won't have logs)

3. **Compute per-workflow statistics**
   - Total tokens (input + output + cache read + cache write)
   - Input/output ratio
   - Cache hit rate (`cache_read / (cache_read + input)`)
   - Average request count per run
   - Average duration per request
   - Model mix (which models are used, in what proportions)

4. **Identify trends**
   - Compare current period vs previous period (if historical data available)
   - Flag workflows with increasing token consumption
   - Flag workflows with low cache hit rates (opportunity for prompt caching)
   - Flag workflows with high input/output ratios (may indicate bloated prompts or tool schemas)

5. **Generate optimization recommendations**
   - Workflows with zero cache hits → suggest enabling prompt caching
   - Workflows with >100:1 input/output ratio → suggest reducing system prompt or tool surface
   - Workflows with many small requests → suggest batching
   - Workflows with unusually high total tokens → flag for review

6. **Create a summary issue**
   - Title: `Token Usage Report: YYYY-MM-DD`
   - Body includes:
     - Overview table (workflow name, total tokens, cost estimate, cache rate)
     - Per-workflow detail sections (collapsible)
     - Historical trend chart (if prior reports exist)
     - Top optimization opportunities
     - Link to previous report issue (if any)
   - Label: `token-usage-report`

### Graceful Degradation

Since token tracking is new:
- Skip runs that have no `agent-artifacts` artifact
- Skip runs where `token-usage.jsonl` is missing or empty
- Report which workflows are not yet instrumented
- Don't fail the workflow if no data is available — create a minimal report noting the gap

### Tools Needed
- `github` MCP server — for listing runs, downloading artifacts, creating issues
- `bash` — for processing JSONL files, computing aggregates
- `web-fetch` — not needed (all data is in GitHub artifacts)

### Permissions
- `actions: read` — to list workflow runs and download artifacts
- `issues: write` — to create summary issues

## Token Usage Record Schema

| Field | Type | Description |
|-------|------|-------------|
| `timestamp` | string | ISO 8601 timestamp |
| `request_id` | string | Unique request ID |
| `provider` | string | `anthropic`, `openai`, `copilot`, `opencode` |
| `model` | string | Model name (e.g., `claude-sonnet-4-6`) |
| `path` | string | API endpoint path |
| `status` | number | HTTP status code |
| `streaming` | boolean | Whether response was SSE streamed |
| `input_tokens` | number | Input/prompt tokens |
| `output_tokens` | number | Output/completion tokens |
| `cache_read_tokens` | number | Prompt cache read tokens |
| `cache_write_tokens` | number | Prompt cache write tokens |
| `duration_ms` | number | Request duration in milliseconds |
| `response_bytes` | number | Response body size |

## Example Analysis (from smoke-claude run)

From a recent smoke-claude run (6 requests):
- **Total tokens**: 197,558 (775 input, 1,227 output, 152K cache read, 43K cache write)
- **Cache hit rate**: 99.5%
- **Models**: claude-haiku-4.5 (routing), claude-sonnet-4.6 (main agent)
- **Estimated cost**: ~$0.23
- **Key finding**: Anthropic prompt caching is working well (63% cost savings)

## Related
- PR #1539 — Token tracking infrastructure
- PR #1549 — Include token logs in firewall audit artifact
- PR #1550 — Fix gzip decompression for Anthropic token extraction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: daily token usage analysis workflow #1551

Summary

Background

Proposed Workflow

Trigger

Behavior

Graceful Degradation

Tools Needed

Permissions

Token Usage Record Schema

Example Analysis (from smoke-claude run)

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Type	Description
`timestamp`	string	ISO 8601 timestamp
`request_id`	string	Unique request ID
`provider`	string	`anthropic`, `openai`, `copilot`, `opencode`
`model`	string	Model name (e.g., `claude-sonnet-4-6`)
`path`	string	API endpoint path
`status`	number	HTTP status code
`streaming`	boolean	Whether response was SSE streamed
`input_tokens`	number	Input/prompt tokens
`output_tokens`	number	Output/completion tokens
`cache_read_tokens`	number	Prompt cache read tokens
`cache_write_tokens`	number	Prompt cache write tokens
`duration_ms`	number	Request duration in milliseconds
`response_bytes`	number	Response body size

feat: daily token usage analysis workflow #1551

Description

Summary

Background

Proposed Workflow

Trigger

Behavior

Graceful Degradation

Tools Needed

Permissions

Token Usage Record Schema

Example Analysis (from smoke-claude run)

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions