Skip to content

⚡ Claude Token Optimization2026-04-06 — Daily Security Review and Threat Modeling #1715

@github-actions

Description

@github-actions

Target Workflow: security-review.md

Source report: #1679
Estimated cost per run: $8.91 (at Anthropic list rates; billed via Copilot API)
Total tokens per run: ~5,140K
Cache read rate: 48.8% (cache_read / total context processed)
Cache write rate: 0% — Copilot API session caching only; no explicit Anthropic cache breakpoints
LLM turns: 33


Current Configuration

Setting Value
Tools loaded 5: agentic-workflows, github [default+actions+code_security], bash, web-fetch, cache-memory
Tools actually used 3: agentic-workflows (Phase 1), bash (Phases 2–4), safe-outputs (output)
GitHub toolsets default + actions + code_security (~30+ tool schemas)
Pre-agent steps No — all work done inside agent
Post-agent steps No
Prompt size 7,392 chars (269 lines) + system context (~40K tokens estimated first turn)
Bash blocks in prompt 7 blocks containing 20 individual commands
Phases 5 phases + output = ~33 natural turn boundaries

Root Cause: High Turn Count

At 33 turns × 79K avg input = 2,620K input tokens, the primary cost driver is turn count, not prompt size. Each turn carries accumulated conversation history. The prompt's sequential phase structure — with individual bash commands scattered across 7 blocks — forces the agent to execute one command at a time, creating ~20+ turns just for evidence gathering.

The workflow decomposes as:

  • Phase 1: 3–4 turns (agentic-workflows status + logs + audit + analysis)
  • Phase 2 (6 sub-sections × 2–3 turns): ~15 turns (one bash → analyze → next bash)
  • Phase 3 (threat model synthesis): 3–4 turns
  • Phase 4 (attack surface mapping): 3–4 turns
  • Phase 5 (best practices): 2 turns
  • Output (create discussion): 3–4 turns

Recommendations

1. Batch Evidence Gathering into a Single Bash Block

Estimated savings: ~1,440K tokens/run (~55% of input, ~$4.32/run)

Currently Phase 2 presents commands one sub-section at a time. The agent reads one file, analyzes it, reads the next. Replace with a single comprehensive bash block that collects all evidence upfront, then asks for analysis in 2–3 synthesis turns.

Current pattern (creates ~15 turns):

### 2.1 Network Security Architecture
```bash
cat src/host-iptables.ts
cat containers/agent/setup-iptables.sh

Analyze...

2.2 Container Security Hardening

grep -rn "cap_drop|capabilities" src/ containers/
cat containers/agent/seccomp-profile.json

Analyze...


**Recommended pattern (2 turns: gather + analyze):**

Replace all of Phase 2 with a single evidence-gathering turn followed by a single comprehensive analysis request:

```markdown
## Phase 2: Codebase Security Analysis

Run all evidence-gathering commands in one bash block:

```bash
echo "=== NETWORK SECURITY ===" && cat src/host-iptables.ts && echo "---" && cat containers/agent/setup-iptables.sh && echo "---" && cat src/squid-config.ts
echo "=== CONTAINER SECURITY ===" && grep -rn "cap_drop\|capabilities\|NET_ADMIN\|NET_RAW" src/ containers/ && cat containers/agent/seccomp-profile.json
echo "=== DOMAIN PATTERNS ===" && cat src/domain-patterns.ts
echo "=== INJECTION RISKS ===" && grep -rn "exec\|spawn\|shell\|command" src/ --include="*.ts" -l
echo "=== DOCKER WRAPPER ===" && cat containers/agent/docker-wrapper.sh
echo "=== DEPENDENCIES ===" && cat package.json && npm audit --json 2>/dev/null | head -100

Then analyze all findings holistically against the STRIDE threat model and generate the full report in one consolidated output.


This collapses 15 evidence-gathering turns into 2. Target: **15 total turns** (down from 33).

---

### 2. Move Phase 1 (Escape Test Context) to a Pre-Step

**Estimated savings:** ~237K tokens/run (~9%, ~$0.71/run)

Phase 1 uses the `agentic-workflows` tool to check recent firewall-escape-test runs — this takes 3–4 turns. Instead, fetch the logs in a GitHub Actions pre-step and inject them into the prompt.

Add a `steps:` block before the agent runs:

```yaml
steps:
  - name: Fetch latest escape test run
    id: escape-test
    run: |
      # Use gh CLI (authenticated in the workflow) to get the latest run
      RUN_ID=$(gh run list --workflow "firewall-escape-test.lock.yml" \
        --status success --limit 1 --json databaseId --jq '.[0].databaseId')
      echo "run_id=$RUN_ID" >> $GITHUB_OUTPUT
      gh run view "$RUN_ID" --log 2>/dev/null | tail -200 > /tmp/escape-test-summary.txt || echo "No recent run" > /tmp/escape-test-summary.txt
      echo "summary<<EOF" >> $GITHUB_OUTPUT
      cat /tmp/escape-test-summary.txt >> $GITHUB_OUTPUT
      echo "EOF" >> $GITHUB_OUTPUT

Then update the Phase 1 prompt section to use the injected context:

## Phase 1: Previous Firewall Escape Test Results

The most recent firewall escape test results are provided below. Use this as complementary context for your security review — do NOT re-fetch using agentic-workflows.

<escape-test-results>
{{ steps.escape-test.outputs.summary }}
</escape-test-results>

Remove the agentic-workflows: tool entirely once Phase 1 is replaced.


3. Restrict GitHub Toolsets from [default, actions, code_security] to [repos, code_security]

Estimated savings: ~300K tokens/run (~12%, ~$0.89/run)

The default toolset loads ~22 tools (PR management, issue CRUD, discussion tools, etc.) and actions loads ~8–10 more. For a read-only security review that outputs via safe-outputs, the used GitHub MCP tools are minimal.

# Before:
tools:
  github:
    toolsets: [default, actions, code_security]

# After:
tools:
  github:
    toolsets: [repos, code_security]

If Phase 1 is moved to a pre-step (Recommendation 2), agentic-workflows can also be removed, eliminating the need for actions toolset entirely.

Approximate tool schema savings: ~15 tools × 600 tokens × 33 turns = ~297K tokens ($0.89)


4. Remove web-fetch Tool

Estimated savings: ~20K tokens/run (~0.4%, ~$0.06/run)

web-fetch is listed in tools: but is never referenced in the prompt body. A codebase security review reads local files via bash — no URL fetching is needed.

# Before:
tools:
  web-fetch:

# After:
# (remove web-fetch entirely)

5. Consolidate Phases 3–5 into One Synthesis Phase

Estimated savings: ~240K tokens/run (~9%, ~$0.72/run)

Phases 3 (threat model), 4 (attack surface), and 5 (best practices comparison) are synthesis steps that don't require additional tool calls. They're currently written as separate phases, prompting 3 separate synthesis turns where one would suffice.

Replace with a single synthesis prompt:

## Phase 3: Security Analysis Synthesis

Based on the evidence collected above, produce a unified security analysis covering:

1. **STRIDE Threat Model** — for each category (Spoofing/Tampering/Repudiation/Info Disclosure/DoS/Elevation), identify threats with evidence citations and likelihood/impact rating
2. **Attack Surface Map** — enumerate each attack surface (network, container, domain parsing, input validation, Docker wrapper) with current protections and weaknesses
3. **CIS/NIST Comparison** — note any gaps vs Docker CIS Benchmark or NIST network filtering guidelines

Produce the full discussion output in one response covering all three analyses.

Cache Analysis (Copilot API)

Note: No artifact download was possible from this environment. Estimates are derived from the aggregate token report.

Metric Value
Total input tokens 2,620K
Cache read tokens 2,499K
Cache write tokens 0
Total context processed 5,119K
Cache hit rate 48.8% (cache_read / total context)
Avg new input/turn ~79K
Avg cached/turn ~75.7K

Cache mechanism: Copilot API uses implicit session context caching, not explicit Anthropic cache breakpoints. The first turn is cold (~40K input, 0 cache). Cache hits grow progressively as the session context (system prompt + tool schemas) warms up. By later turns, ~46K tokens/turn are served from cache.

Cache write amortization: N/A — no explicit cache writes. The implicit session caching provides 48.8% cache hit rate organically. If this workflow were migrated to direct Anthropic API with explicit cache_control breakpoints on the system prompt, the cache hit rate could reach 85–90%+.

Recommendation: If the workflow transitions to provider: anthropic (direct API), add cache_control: ephemeral on the system prompt to maximize cache reuse. At 40K system prompt tokens, the cache write pays for itself after 2 turns (12.5× write cost / 0.1× read cost = breakeven at 2 reads).


Expected Impact

Metric Current Projected Savings
Total tokens/run 5,140K ~2,100K -59%
Input tokens/run 2,620K ~1,100K -58%
Cache reads/run 2,499K ~1,000K -60%
Cost/run (list rates) $8.91 ~$2.90 -67%
LLM turns 33 ~15 -55%
Session time ~6 min ~3 min (est.) -50%

These projections assume implementing Recommendations 1–4 together.


Implementation Checklist

  • Batch Phase 2 bash commands — merge 7 separate bash blocks into 1 comprehensive evidence-gathering block (Rec. 1)
  • Consolidate Phases 3–5 into one synthesis phase (Rec. 5)
  • Add steps: pre-step to fetch escape test logs and inject via template variable (Rec. 2)
  • Remove agentic-workflows: tool from tools: section after pre-step is added (Rec. 2)
  • Change GitHub toolsets from [default, actions, code_security][repos, code_security] (Rec. 3)
  • Remove web-fetch: tool from tools: section (Rec. 4)
  • Recompile: gh aw compile .github/workflows/security-review.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts (if lock file needs post-processing)
  • Trigger a manual run via workflow_dispatch and compare token usage vs $8.91 baseline
  • Update token usage baseline in next Claude Token Usage Report

Generated by Daily Claude Token Optimization Advisor · ● 1.2M ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions