Skip to content

⚡ Copilot Token Optimization2026-04-05 — Pelis Agent Factory Advisor #1696

@github-actions

Description

@github-actions

Target Workflow: pelis-agent-factory-advisor

Source report: #1676
Estimated cost per run: $2.02
Total tokens per run: ~1,399K (759K input · 13K output · 627K cache_read)
Cache hit rate: 45.2% (cold on turn 1; warms progressively within the run)
LLM turns: 11 requests
Model: claude-sonnet-4.6 via Copilot endpoint
Session duration: ~4 minutes

Current Configuration

Setting Value
Tools loaded 4 — agentic-workflows, bash (wildcard "*"), web-fetch, cache-memory
Network allowed github.github.io only
Pre-agent steps Yes — fetches 4 doc pages from github.github.io/gh-aw
Prompt size 8,576 bytes (~1,165 words)
Input growth 39,950 tokens (turn 1) → 83,324 tokens (turn 11)
Turns with heavy output Turns 8–10 (1,794 / 2,227 / 4,959 output tokens; 33–88 s latency)

Recommendations

1. Fix or remove the blocked githubnext/agentics fetch (Step 1.2)

Estimated savings: ~100–150K tokens/run (~8–11%)

Step 1.2 instructs the agent to call web-fetch on https://github.com/githubnext/agentics — but the network allowlist only permits github.github.io. Every attempt is blocked by the AWF firewall, causing the agent to spend 1–2 extra turns handling the 403/connection error and falling back.

Option A (recommended) — pre-fetch the content in a steps: entry:

steps:
  - name: Fetch Pelis Agent Factory Docs
    id: fetch-docs
    run: |
      set -o pipefail
      BASE="(github.github.io/redacted)
      OUTFILE="\$\{GITHUB_WORKSPACE}/.pelis-agent-factory-docs.txt"
      : > "$OUTFILE"
      for PATH_SUFFIX in \
        "/blog/2026-01-12-welcome-to-pelis-agent-factory/" \
        "/introduction/overview/" \
        "/guides/workflow-patterns/" \
        "/guides/best-practices/"; do
        echo "### \$\{BASE}\$\{PATH_SUFFIX}" >> "$OUTFILE"
        curl -sf "\$\{BASE}\$\{PATH_SUFFIX}" \
          | python3 -c "import sys,html,re;t=sys.stdin.read();print(html.unescape(re.sub('<[^>]+>','',t))[:8000])" \
          >> "$OUTFILE" 2>/dev/null \
          || echo "(not found)" >> "$OUTFILE"
        echo "" >> "$OUTFILE"
      done

  - name: Fetch Agentics Patterns              # NEW
    id: fetch-agentics
    run: |
      curl -sf "https://raw.githubusercontent.com/githubnext/agentics/main/README.md" \
        | head -c 8000 > "\$\{GITHUB_WORKSPACE}/.agentics-patterns.txt" \
        || echo "(not available)" > "\$\{GITHUB_WORKSPACE}/.agentics-patterns.txt"

Then add raw.githubusercontent.com to network.allowed and update the prompt to read .agentics-patterns.txt instead of using web-fetch. This also lets you remove the web-fetch tool entirely (see Recommendation 3).

Option B (simpler) — remove Step 1.2: The agentics repo content is supplementary. Deleting the two web-fetch calls for it from the prompt removes the blocked-fetch retry loop with no loss of core functionality, since the cache-memory already persists discovered patterns.


2. Pre-compute repository structure in steps:

Estimated savings: ~100–130K tokens/run (~8–9%)

Phase 2.2 of the prompt instructs the agent to run a series of fully deterministic shell commands (ls -la, find .github/workflows -name "*.md", ls tests/, ls scripts/). These produce the same output every time and add 1–2 LLM turns of bash tool-call overhead (request + response context accumulation).

Move these to a pre-agent step and inject via $GITHUB_ENV or a file:

  - name: Collect Repo Structure
    id: repo-structure
    run: |
      {
        echo "=== Root files ==="
        ls -la
        echo ""
        echo "=== Agentic workflows ==="
        find .github/workflows -name "*.md" -type f | sort
        echo ""
        echo "=== Tests ==="
        ls -la tests/ 2>/dev/null || echo "(no tests/)"
        echo ""
        echo "=== Scripts ==="
        ls -la scripts/ 2>/dev/null || echo "(no scripts/)"
      } > "\$\{GITHUB_WORKSPACE}/.repo-structure.txt"

Update the prompt to read .repo-structure.txt instead of running bash commands, and remove or simplify Phase 2.2's bash instructions. This shifts ~2 agent tool-call round-trips into a pre-step, avoiding ~60–70K tokens of input context growth for those turns and their downstream history accumulation.


3. Remove the web-fetch tool

Estimated savings: ~6,000–8,000 tokens/run (~0.5%)

Once Steps 1.1 and 1.2 documentation is pre-fetched (Recommendation 1), the web-fetch tool is no longer needed. Removing it eliminates the tool schema from every turn's context (~600–700 tokens × 11 turns).

In the workflow frontmatter, delete:

# Remove this line:
  web-fetch:

4. Reduce prompt verbosity (~40% reduction)

Estimated savings: ~30–40K tokens/run (~3%)

The prompt is 8,576 bytes of highly detailed multi-phase instructions. Turn 1 costs 39,950 input tokens — of which a meaningful chunk is the verbose prompt itself. Much of it can be condensed:

Phase 1 cache check (currently ~600 bytes) can become:

## Phase 1: Learn Pelis Agent Factory Patterns

Check cache-memory for `pelis_docs_hash`. Hash `.pelis-agent-factory-docs.txt`
and `.agentics-patterns.txt`. If unchanged, skip to Phase 2 using cached knowledge.
Otherwise read both files and update the hash in cache-memory.

Phase 2.2 repository analysis (currently ~400 bytes of redundant bash listings) can reference the pre-fetched file:

## Phase 2: Analyze Repository

Pre-computed structure is in `.repo-structure.txt`. Agentic workflow definitions
are in `.github/workflows/*.md`. Review them to understand current automation coverage.

Phase 3 opportunity categories (currently ~600 bytes of bullet lists) can be trimmed to the top 5 most relevant categories for this repo (security automation, test coverage, release, documentation, monitoring).

Estimated prompt reduction: ~3,500–4,000 bytes (~875–1,000 tokens) × carried through 11 turns in context ≈ 30–40K tokens net.


5. Restrict bash to specific commands

Estimated savings: ~2,000–4,000 tokens/run (tool schema reduction)

bash: ["*"] loads schema entries for every bash pattern. Restrict to what the workflow actually uses:

tools:
  agentic-workflows:
  bash:
    - "cat"
    - "find"
    - "ls"
    - "grep"
  cache-memory: true

This is also a defence-in-depth improvement (reduces agent's command surface).


Expected Impact

Metric Current Projected Savings
Total tokens/run 1,399K ~1,100K ~-21%
Cost/run $2.02 ~$1.60 ~-$0.42
LLM turns 11 ~8–9 -2 to -3
Turn 1 input tokens 39,950 ~28,000 ~-30%
Cache hit rate 45.2% ~50–55% +5–10pp

Savings are conservative estimates. The biggest wins come from Recommendations 1 and 2 (removing wasted turns from blocked fetches and pre-computing deterministic work).

Implementation Checklist

  • Rec 1: Add fetch-agentics pre-step; add raw.githubusercontent.com to network.allowed; remove Step 1.2 web-fetch instructions from prompt
  • Rec 2: Add collect-repo-structure pre-step; replace Phase 2.2 bash commands with file-read instruction
  • Rec 3: Remove web-fetch: from tools: in frontmatter (after Recs 1 & 2 land)
  • Rec 4: Condense Phase 1 cache-check instructions and Phase 2.2 bash listings in prompt body
  • Rec 5: Replace bash: ["*"] with specific command allowlist
  • Recompile: gh-aw compile .github/workflows/pelis-agent-factory-advisor.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Verify CI passes on PR
  • Compare token-usage.jsonl on new run vs baseline (target: ≤ 1,100K tokens, ≤ $1.60)

Generated by Daily Copilot Token Optimization Advisor · ● 669.2K ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions