Overview
Period: 2026-04-02T15:43Z to 2026-04-02T16:21Z (push-triggered runs on fix/openai-cache-token-tracking)
Runs analyzed: 4 Copilot-engine runs (3 had token data)
Total tokens: 786,861 across all instrumented workflows (449,616 billable)
Estimated total cost: $1.48
✅ Major improvement vs. previous report: Cache hit rates jumped from 0% → 38–45% across all instrumented workflows. The fix/openai-cache-token-tracking branch appears to have fixed cache token reporting — caching is now correctly tracked and billing reflects actual cache discounts. Per-request billable tokens dropped ~1–2% compared to the previous report's first runs (where no cache was active yet).
Workflow Summary
| Workflow |
Run |
Requests |
Total Tokens |
Billable Tokens |
Est. Cost |
Cache Rate |
I/O Ratio |
Top Model |
| Smoke Copilot |
§23908898227 |
5 |
374K |
211K |
$0.70 |
43.7% |
253:1 |
sonnet-4.6 |
| Build Test Suite |
§23908898271 |
4 |
256K |
142K |
$0.47 |
44.8% |
233:1 |
sonnet-4.6 |
| Smoke Chroot |
§23908898275 |
3 |
157K |
97K |
$0.31 |
38.5% |
305:1 |
sonnet-4.6 |
🔍 Optimization Opportunities
-
All workflows — first request in each run has 0 cache reads
- Requests 1 in all three workflows show 0
cache_read_tokens — the cache is cold on each fresh run
- Subsequent requests within the same run benefit from the warm cache (38–45% hit rate)
- Recommendation: If runs share a consistent system prompt, consider pre-warming the cache or using persistent cache keys across runs. Enabling
cache_write_tokens tracking at the Copilot endpoint would also improve cost attribution.
-
No cache_write_tokens reported in any workflow
- All workflows show
cache_write_tokens = 0, meaning we can only see cache reads but not writes
- The Copilot provider may not return write counts separately from input tokens
- Recommendation: Investigate whether the api-proxy can track cache writes via response headers or usage metadata for the Copilot inference endpoint. This would make cost attribution more accurate.
-
High I/O ratios persist (233–305:1)
- All workflows show very high input-to-output ratios, driven by large system prompts / tool schemas (31–43K input tokens/request vs 29–494 output tokens)
- This is partially mitigated by caching, but the underlying context size is large
- Recommendation: Review MCP tool surface in Smoke workflows — reducing tool schemas would lower input tokens for all requests including the cold-cache first request
Per-Workflow Details
Smoke Copilot
- Run: §23908898227 (push to
fix/openai-cache-token-tracking)
- Requests: 5 (all
claude-sonnet-4.6 via copilot provider)
- Tokens: 374K total — 210K input, 1.5K output, 163K cache_read, 0 cache_write
- Cache hit rate: 43.7%
- Avg latency: 5,564ms/request
- Estimated cost: $0.70
| # |
Timestamp |
Input |
Output |
Cache Read |
Duration |
| 1 |
2026-04-02 15:46:49 |
40,291 |
353 |
0 |
7,066ms |
| 2 |
2026-04-02 15:46:57 |
41,351 |
331 |
36,518 |
5,297ms |
| 3 |
2026-04-02 15:47:05 |
42,180 |
494 |
41,287 |
6,928ms |
| 4 |
2026-04-02 15:47:11 |
42,816 |
266 |
42,179 |
5,723ms |
| 5 |
2026-04-02 15:47:14 |
43,167 |
29 |
42,815 |
2,806ms |
Build Test Suite
- Run: §23908898271 (push to
fix/openai-cache-token-tracking)
- Requests: 4 (all
claude-sonnet-4.6 via copilot provider)
- Tokens: 256K total — 141K input, 1.1K output, 114K cache_read, 0 cache_write
- Cache hit rate: 44.8%
- Avg latency: 5,350ms/request
- Estimated cost: $0.47
| # |
Timestamp |
Input |
Output |
Cache Read |
Duration |
| 1 |
2026-04-02 15:46:24 |
34,039 |
258 |
13,574 |
5,309ms |
| 2 |
2026-04-02 15:46:39 |
34,559 |
520 |
30,138 |
6,758ms |
| 3 |
2026-04-02 15:46:46 |
35,857 |
224 |
34,558 |
5,784ms |
| 4 |
2026-04-02 15:46:49 |
36,098 |
89 |
35,856 |
3,548ms |
Smoke Chroot
- Run: §23908898275 (push to
fix/openai-cache-token-tracking)
- Requests: 3 (all
claude-sonnet-4.6 via copilot provider)
- Tokens: 157K total — 96K input, 513 output, 60K cache_read, 0 cache_write
- Cache hit rate: 38.5%
- Avg latency: 5,724ms/request
- Estimated cost: $0.31
| # |
Timestamp |
Input |
Output |
Cache Read |
Duration |
| 1 |
2026-04-02 15:48:12 |
31,636 |
221 |
0 |
5,187ms |
| 2 |
2026-04-02 15:48:17 |
32,151 |
226 |
28,170 |
4,716ms |
| 3 |
2026-04-02 15:48:25 |
32,394 |
66 |
32,150 |
7,270ms |
Workflows Without Token Data
The following Copilot-engine workflows ran in the past 24 hours but had no agent-artifacts with token data:
| Workflow |
Run ID |
Reason |
| Smoke Services |
§23908898245 |
Has agent artifact but no agent-artifacts — likely doesn't use --enable-api-proxy |
| Daily Copilot Token Usage Analyzer |
§23908784491 |
Previous analyzer run (this workflow itself) |
Historical Trend
Comparison to previous report #1591 (2026-04-02, 01:06 UTC window):
| Workflow |
Prev Cache Rate |
Curr Cache Rate |
Prev Billable/Req |
Curr Billable/Req |
Change |
| Smoke Copilot |
0% |
43.7% |
42,800 |
42,256 |
-1% |
| Build Test Suite |
0% |
44.8% |
36,250 |
35,411 |
-2% |
| Secret Digger (Copilot) |
0% |
— |
— |
(no run this period) |
— |
The cache hit rate improvement from 0% to ~40–45% is the direct result of the fix/openai-cache-token-tracking fix correctly parsing prompt_tokens_details.cached_tokens from OpenAI/Copilot API responses. Per-request billable tokens are essentially unchanged (within 1–2%), confirming the cache was not newly enabled but was already active and now being correctly measured.
Previous Report
#1591 — 📊 Copilot Token Usage Report 2026-04-02
References:
Generated by Daily Copilot Token Usage Analyzer · ◷
Overview
Period: 2026-04-02T15:43Z to 2026-04-02T16:21Z (push-triggered runs on
fix/openai-cache-token-tracking)Runs analyzed: 4 Copilot-engine runs (3 had token data)
Total tokens: 786,861 across all instrumented workflows (449,616 billable)
Estimated total cost: $1.48
Workflow Summary
🔍 Optimization Opportunities
All workflows — first request in each run has 0 cache reads
cache_read_tokens— the cache is cold on each fresh runcache_write_tokenstracking at the Copilot endpoint would also improve cost attribution.No
cache_write_tokensreported in any workflowcache_write_tokens = 0, meaning we can only see cache reads but not writesHigh I/O ratios persist (233–305:1)
Per-Workflow Details
Smoke Copilot
fix/openai-cache-token-tracking)claude-sonnet-4.6via copilot provider)Build Test Suite
fix/openai-cache-token-tracking)claude-sonnet-4.6via copilot provider)Smoke Chroot
fix/openai-cache-token-tracking)claude-sonnet-4.6via copilot provider)Workflows Without Token Data
The following Copilot-engine workflows ran in the past 24 hours but had no
agent-artifactswith token data:agentartifact but noagent-artifacts— likely doesn't use--enable-api-proxyHistorical Trend
Comparison to previous report #1591 (2026-04-02, 01:06 UTC window):
The cache hit rate improvement from 0% to ~40–45% is the direct result of the
fix/openai-cache-token-trackingfix correctly parsingprompt_tokens_details.cached_tokensfrom OpenAI/Copilot API responses. Per-request billable tokens are essentially unchanged (within 1–2%), confirming the cache was not newly enabled but was already active and now being correctly measured.Previous Report
#1591 — 📊 Copilot Token Usage Report 2026-04-02
References: