Target Workflow: security-review.md
Source report: #1679
Estimated cost per run: $8.91 (at Anthropic list rates; billed via Copilot API)
Total tokens per run: ~5,140K
Cache read rate: 48.8% (cache_read / total context processed)
Cache write rate: 0% — Copilot API session caching only; no explicit Anthropic cache breakpoints
LLM turns: 33
Current Configuration
| Setting |
Value |
| Tools loaded |
5: agentic-workflows, github [default+actions+code_security], bash, web-fetch, cache-memory |
| Tools actually used |
3: agentic-workflows (Phase 1), bash (Phases 2–4), safe-outputs (output) |
| GitHub toolsets |
default + actions + code_security (~30+ tool schemas) |
| Pre-agent steps |
No — all work done inside agent |
| Post-agent steps |
No |
| Prompt size |
7,392 chars (269 lines) + system context (~40K tokens estimated first turn) |
| Bash blocks in prompt |
7 blocks containing 20 individual commands |
| Phases |
5 phases + output = ~33 natural turn boundaries |
Root Cause: High Turn Count
At 33 turns × 79K avg input = 2,620K input tokens, the primary cost driver is turn count, not prompt size. Each turn carries accumulated conversation history. The prompt's sequential phase structure — with individual bash commands scattered across 7 blocks — forces the agent to execute one command at a time, creating ~20+ turns just for evidence gathering.
The workflow decomposes as:
- Phase 1: 3–4 turns (agentic-workflows status + logs + audit + analysis)
- Phase 2 (6 sub-sections × 2–3 turns): ~15 turns (one bash → analyze → next bash)
- Phase 3 (threat model synthesis): 3–4 turns
- Phase 4 (attack surface mapping): 3–4 turns
- Phase 5 (best practices): 2 turns
- Output (create discussion): 3–4 turns
Recommendations
1. Batch Evidence Gathering into a Single Bash Block
Estimated savings: ~1,440K tokens/run (~55% of input, ~$4.32/run)
Currently Phase 2 presents commands one sub-section at a time. The agent reads one file, analyzes it, reads the next. Replace with a single comprehensive bash block that collects all evidence upfront, then asks for analysis in 2–3 synthesis turns.
Current pattern (creates ~15 turns):
### 2.1 Network Security Architecture
```bash
cat src/host-iptables.ts
cat containers/agent/setup-iptables.sh
Analyze...
2.2 Container Security Hardening
grep -rn "cap_drop|capabilities" src/ containers/
cat containers/agent/seccomp-profile.json
Analyze...
**Recommended pattern (2 turns: gather + analyze):**
Replace all of Phase 2 with a single evidence-gathering turn followed by a single comprehensive analysis request:
```markdown
## Phase 2: Codebase Security Analysis
Run all evidence-gathering commands in one bash block:
```bash
echo "=== NETWORK SECURITY ===" && cat src/host-iptables.ts && echo "---" && cat containers/agent/setup-iptables.sh && echo "---" && cat src/squid-config.ts
echo "=== CONTAINER SECURITY ===" && grep -rn "cap_drop\|capabilities\|NET_ADMIN\|NET_RAW" src/ containers/ && cat containers/agent/seccomp-profile.json
echo "=== DOMAIN PATTERNS ===" && cat src/domain-patterns.ts
echo "=== INJECTION RISKS ===" && grep -rn "exec\|spawn\|shell\|command" src/ --include="*.ts" -l
echo "=== DOCKER WRAPPER ===" && cat containers/agent/docker-wrapper.sh
echo "=== DEPENDENCIES ===" && cat package.json && npm audit --json 2>/dev/null | head -100
Then analyze all findings holistically against the STRIDE threat model and generate the full report in one consolidated output.
This collapses 15 evidence-gathering turns into 2. Target: **15 total turns** (down from 33).
---
### 2. Move Phase 1 (Escape Test Context) to a Pre-Step
**Estimated savings:** ~237K tokens/run (~9%, ~$0.71/run)
Phase 1 uses the `agentic-workflows` tool to check recent firewall-escape-test runs — this takes 3–4 turns. Instead, fetch the logs in a GitHub Actions pre-step and inject them into the prompt.
Add a `steps:` block before the agent runs:
```yaml
steps:
- name: Fetch latest escape test run
id: escape-test
run: |
# Use gh CLI (authenticated in the workflow) to get the latest run
RUN_ID=$(gh run list --workflow "firewall-escape-test.lock.yml" \
--status success --limit 1 --json databaseId --jq '.[0].databaseId')
echo "run_id=$RUN_ID" >> $GITHUB_OUTPUT
gh run view "$RUN_ID" --log 2>/dev/null | tail -200 > /tmp/escape-test-summary.txt || echo "No recent run" > /tmp/escape-test-summary.txt
echo "summary<<EOF" >> $GITHUB_OUTPUT
cat /tmp/escape-test-summary.txt >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
Then update the Phase 1 prompt section to use the injected context:
## Phase 1: Previous Firewall Escape Test Results
The most recent firewall escape test results are provided below. Use this as complementary context for your security review — do NOT re-fetch using agentic-workflows.
<escape-test-results>
{{ steps.escape-test.outputs.summary }}
</escape-test-results>
Remove the agentic-workflows: tool entirely once Phase 1 is replaced.
3. Restrict GitHub Toolsets from [default, actions, code_security] to [repos, code_security]
Estimated savings: ~300K tokens/run (~12%, ~$0.89/run)
The default toolset loads ~22 tools (PR management, issue CRUD, discussion tools, etc.) and actions loads ~8–10 more. For a read-only security review that outputs via safe-outputs, the used GitHub MCP tools are minimal.
# Before:
tools:
github:
toolsets: [default, actions, code_security]
# After:
tools:
github:
toolsets: [repos, code_security]
If Phase 1 is moved to a pre-step (Recommendation 2), agentic-workflows can also be removed, eliminating the need for actions toolset entirely.
Approximate tool schema savings: ~15 tools × 600 tokens × 33 turns = ~297K tokens ($0.89)
4. Remove web-fetch Tool
Estimated savings: ~20K tokens/run (~0.4%, ~$0.06/run)
web-fetch is listed in tools: but is never referenced in the prompt body. A codebase security review reads local files via bash — no URL fetching is needed.
# Before:
tools:
web-fetch:
# After:
# (remove web-fetch entirely)
5. Consolidate Phases 3–5 into One Synthesis Phase
Estimated savings: ~240K tokens/run (~9%, ~$0.72/run)
Phases 3 (threat model), 4 (attack surface), and 5 (best practices comparison) are synthesis steps that don't require additional tool calls. They're currently written as separate phases, prompting 3 separate synthesis turns where one would suffice.
Replace with a single synthesis prompt:
## Phase 3: Security Analysis Synthesis
Based on the evidence collected above, produce a unified security analysis covering:
1. **STRIDE Threat Model** — for each category (Spoofing/Tampering/Repudiation/Info Disclosure/DoS/Elevation), identify threats with evidence citations and likelihood/impact rating
2. **Attack Surface Map** — enumerate each attack surface (network, container, domain parsing, input validation, Docker wrapper) with current protections and weaknesses
3. **CIS/NIST Comparison** — note any gaps vs Docker CIS Benchmark or NIST network filtering guidelines
Produce the full discussion output in one response covering all three analyses.
Cache Analysis (Copilot API)
Note: No artifact download was possible from this environment. Estimates are derived from the aggregate token report.
| Metric |
Value |
| Total input tokens |
2,620K |
| Cache read tokens |
2,499K |
| Cache write tokens |
0 |
| Total context processed |
5,119K |
| Cache hit rate |
48.8% (cache_read / total context) |
| Avg new input/turn |
~79K |
| Avg cached/turn |
~75.7K |
Cache mechanism: Copilot API uses implicit session context caching, not explicit Anthropic cache breakpoints. The first turn is cold (~40K input, 0 cache). Cache hits grow progressively as the session context (system prompt + tool schemas) warms up. By later turns, ~46K tokens/turn are served from cache.
Cache write amortization: N/A — no explicit cache writes. The implicit session caching provides 48.8% cache hit rate organically. If this workflow were migrated to direct Anthropic API with explicit cache_control breakpoints on the system prompt, the cache hit rate could reach 85–90%+.
Recommendation: If the workflow transitions to provider: anthropic (direct API), add cache_control: ephemeral on the system prompt to maximize cache reuse. At 40K system prompt tokens, the cache write pays for itself after 2 turns (12.5× write cost / 0.1× read cost = breakeven at 2 reads).
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Total tokens/run |
5,140K |
~2,100K |
-59% |
| Input tokens/run |
2,620K |
~1,100K |
-58% |
| Cache reads/run |
2,499K |
~1,000K |
-60% |
| Cost/run (list rates) |
$8.91 |
~$2.90 |
-67% |
| LLM turns |
33 |
~15 |
-55% |
| Session time |
~6 min |
~3 min (est.) |
-50% |
These projections assume implementing Recommendations 1–4 together.
Implementation Checklist
Generated by Daily Claude Token Optimization Advisor · ● 1.2M · ◷
Target Workflow:
security-review.mdSource report: #1679
Estimated cost per run: $8.91 (at Anthropic list rates; billed via Copilot API)
Total tokens per run: ~5,140K
Cache read rate: 48.8% (cache_read / total context processed)
Cache write rate: 0% — Copilot API session caching only; no explicit Anthropic cache breakpoints
LLM turns: 33
Current Configuration
agentic-workflows,github [default+actions+code_security],bash,web-fetch,cache-memoryagentic-workflows(Phase 1),bash(Phases 2–4),safe-outputs(output)default+actions+code_security(~30+ tool schemas)Root Cause: High Turn Count
At 33 turns × 79K avg input = 2,620K input tokens, the primary cost driver is turn count, not prompt size. Each turn carries accumulated conversation history. The prompt's sequential phase structure — with individual bash commands scattered across 7 blocks — forces the agent to execute one command at a time, creating ~20+ turns just for evidence gathering.
The workflow decomposes as:
Recommendations
1. Batch Evidence Gathering into a Single Bash Block
Estimated savings: ~1,440K tokens/run (~55% of input, ~$4.32/run)
Currently Phase 2 presents commands one sub-section at a time. The agent reads one file, analyzes it, reads the next. Replace with a single comprehensive bash block that collects all evidence upfront, then asks for analysis in 2–3 synthesis turns.
Current pattern (creates ~15 turns):
Analyze...
2.2 Container Security Hardening
grep -rn "cap_drop|capabilities" src/ containers/ cat containers/agent/seccomp-profile.jsonAnalyze...
Then analyze all findings holistically against the STRIDE threat model and generate the full report in one consolidated output.
Then update the Phase 1 prompt section to use the injected context:
Remove the
agentic-workflows:tool entirely once Phase 1 is replaced.3. Restrict GitHub Toolsets from
[default, actions, code_security]to[repos, code_security]Estimated savings: ~300K tokens/run (~12%, ~$0.89/run)
The
defaulttoolset loads ~22 tools (PR management, issue CRUD, discussion tools, etc.) andactionsloads ~8–10 more. For a read-only security review that outputs viasafe-outputs, the used GitHub MCP tools are minimal.If Phase 1 is moved to a pre-step (Recommendation 2),
agentic-workflowscan also be removed, eliminating the need foractionstoolset entirely.Approximate tool schema savings: ~15 tools × 600 tokens × 33 turns = ~297K tokens ($0.89)
4. Remove
web-fetchToolEstimated savings: ~20K tokens/run (~0.4%, ~$0.06/run)
web-fetchis listed intools:but is never referenced in the prompt body. A codebase security review reads local files via bash — no URL fetching is needed.5. Consolidate Phases 3–5 into One Synthesis Phase
Estimated savings: ~240K tokens/run (~9%, ~$0.72/run)
Phases 3 (threat model), 4 (attack surface), and 5 (best practices comparison) are synthesis steps that don't require additional tool calls. They're currently written as separate phases, prompting 3 separate synthesis turns where one would suffice.
Replace with a single synthesis prompt:
Cache Analysis (Copilot API)
Cache mechanism: Copilot API uses implicit session context caching, not explicit Anthropic cache breakpoints. The first turn is cold (~40K input, 0 cache). Cache hits grow progressively as the session context (system prompt + tool schemas) warms up. By later turns, ~46K tokens/turn are served from cache.
Cache write amortization: N/A — no explicit cache writes. The implicit session caching provides 48.8% cache hit rate organically. If this workflow were migrated to direct Anthropic API with explicit
cache_controlbreakpoints on the system prompt, the cache hit rate could reach 85–90%+.Recommendation: If the workflow transitions to
provider: anthropic(direct API), addcache_control: ephemeralon the system prompt to maximize cache reuse. At 40K system prompt tokens, the cache write pays for itself after 2 turns (12.5× write cost / 0.1× read cost = breakeven at 2 reads).Expected Impact
These projections assume implementing Recommendations 1–4 together.
Implementation Checklist
steps:pre-step to fetch escape test logs and inject via template variable (Rec. 2)agentic-workflows:tool fromtools:section after pre-step is added (Rec. 2)[default, actions, code_security]→[repos, code_security](Rec. 3)web-fetch:tool fromtools:section (Rec. 4)gh aw compile .github/workflows/security-review.mdnpx tsx scripts/ci/postprocess-smoke-workflows.ts(if lock file needs post-processing)workflow_dispatchand compare token usage vs $8.91 baseline