Target Workflow: pelis-agent-factory-advisor
Source report: #1676 (2026-04-05 — most recent report with token data)
Estimated cost per run: $2.02
Total tokens per run: ~1,399K
Cache hit rate: 45.2% (0% on turn 1, ~48% on turns 6–11)
LLM turns: 11
Model: claude-sonnet-4.6 (via Copilot endpoint)
Current Configuration
| Setting |
Value |
| Tools loaded |
36 total |
agenticworkflows tools |
8 (add, audit, compile, fix, logs, mcp-inspect, status, update) |
github tools (auto-included) |
25 (actions_get, actions_list, get_commit, get_file_contents, get_job_logs, get_label, get_latest_release, get_me, get_release_by_tag, get_tag, get_team_members, get_teams, issue_read, list_branches, list_commits, list_issue_types, list_issues, list_pull_requests, list_releases, list_tags, pull_request_read, search_code, search_issues, search_pull_requests, search_repositories) |
github toolsets loaded |
context, repos, issues, pull_requests |
safeoutputs tools |
4 (create_discussion, missing_tool, noop, missing_data) |
| Tools actually used by prompt |
agenticworkflows:status, agenticworkflows:audit, bash:cat, cache-memory, safeoutputs:create_discussion |
| Pre-agent steps |
✅ Yes — 4 steps (fetch Pelis docs, fetch agentics patterns, compute hash, collect repo structure) |
| Prompt size |
8,897 bytes (~2,220 tokens of user instructions) |
| Network groups |
github.github.io only |
Context growth pattern (from run #23993514169):
| Turn |
Input Tokens |
Cache Read |
Delta |
| 1 |
39,950 |
0 |
baseline |
| 2 |
59,297 |
33,976 |
+19,347 (large tool result — docs read) |
| 3 |
61,343 |
46,636 |
+2,046 |
| 4 |
69,222 |
53,989 |
+7,879 (another large read) |
| 5–11 |
71K→83K |
61K→78K |
+2–5K/turn |
Turn 2's +19K jump indicates sequential file reads (one per turn) driving early context bloat.
Recommendations
1. Restrict GitHub MCP toolset to context only
Estimated savings: ~77K tokens/run (~5.5%), ~$0.11/run
The workflow's prompt uses zero GitHub MCP tools. All analysis goes through agentic-workflows and pre-computed files. Yet the GitHub MCP server auto-loads 25 tools (repos, issues, pull_requests toolsets) because those permissions are declared.
Add an explicit github: block to the tools: section to cap the toolset:
# .github/workflows/pelis-agent-factory-advisor.md — tools section
tools:
agentic-workflows:
bash:
- "cat"
- "find"
- "ls"
- "grep"
cache-memory: true
github:
toolsets: [context] # ← add this line
This changes GITHUB_TOOLSETS in the compiled lock file from context,repos,issues,pull_requests (25 tools) to just context (~2 tools), removing ~23 unnecessary tool schemas.
Savings breakdown: ~23 tools × ~350 tokens/schema = ~8,050 tokens/turn × 11 turns = ~88,550 tokens saved. At current I/O ratio, this maps to ~$0.12/run.
2. Add parallel tool-call instruction to reduce sequential file reads
Estimated savings: ~60K tokens/run (~4.3%), ~$0.09/run
Turn 2 adds +19K tokens — the biggest single-turn jump — from reading pre-fetched doc files one at a time. The agent makes sequential cat calls rather than batching them. Adding an explicit batching instruction collapses turns 2–4 (file reads) into turn 1, eliminating ~2–3 early API calls.
Add the following to the top of Phase 1 in the prompt body:
> **Efficiency note:** Read all required files in a **single parallel batch** —
> call `bash:cat` for `.content-hash.txt`, `.pelis-agent-factory-docs.txt`,
> `.agentics-patterns.txt`, and `.repo-structure.txt` simultaneously in your
> first turn. Do not read them one at a time.
And add a similar note at the start of the prompt:
> **Parallel tool calls:** Always batch independent operations into a single
> turn. Read multiple files simultaneously. Call `agentic-workflows status`
> and `agentic-workflows audit` in the same turn.
Savings breakdown: Each eliminated early turn saves the system prompt being re-sent (~40K tokens). Eliminating 2–3 turns ≈ 80–120K input tokens. Net estimate after accounting for batch overhead: ~60K tokens saved.
3. Condense the Phase 4 output template
Estimated savings: ~11K tokens/run (~0.8%), ~$0.02/run
Phase 4 spans 103 lines and contains a detailed discussion output template with repeated placeholder blocks ((List P0 items), (List P1 items), etc.). This template is repeated in the context across all 11 turns.
Replace the current Phase 4 section with a compact version (~50 lines):
## Output Format
Create a discussion with these sections (use `create_discussion`):
1. **📊 Executive Summary** — 2–3 sentences on maturity and top opportunities
2. **🎓 Patterns Learned** — Key patterns from Pelis docs vs current repo
3. **📋 Workflow Inventory** — Table: `| Workflow | Purpose | Trigger | Assessment |`
4. **🚀 Recommendations** — Grouped by priority (P0–P3), each with: What / Why / How / Effort / Example
5. **📈 Maturity Assessment** — Current/Target level (1–5), gap analysis
6. **🔄 Best Practice Comparison** — What it does well, what to improve
7. **📝 Notes** — Update cache-memory with observations
Priority levels: P0=High impact+Low effort, P1=High impact+Medium effort, P2=Medium, P3=Nice-to-have.
Savings breakdown: Reducing from ~2,200 prompt tokens to ~900 tokens saves ~1,300 tokens/turn × 11 turns = ~14,300 tokens saved.
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Total tokens/run |
1,399K |
~1,248K |
~151K (−11%) |
| Cost/run |
$2.02 |
~$1.80 |
−$0.22 (−11%) |
| LLM turns |
11 |
8–9 |
−2 to −3 |
| GitHub tools loaded |
25 |
~2 |
−23 tools |
| Turn 1 input tokens |
39,950 |
~31,900 |
−8,050 (−20%) |
Turn 1 savings are largest: no repos/issues/PR tool schemas in the cold-start API call.
Implementation Checklist
Notes
- Cache hit rate (45.2%) cannot be improved significantly with the Copilot provider —
cache_write_tokens is 0 (not reported separately by this endpoint). The cold-start on turn 1 is inherent to the session-based architecture. No action needed.
- Context growth (40K→83K over 11 turns) is driven by tool results accumulating in history. Recommendations 1 and 2 address the largest sources: smaller tool schemas and fewer turns.
- The
agenticworkflows server loads 8 tools but only status and audit are referenced in the prompt. There is no per-tool allowlist syntax for agentic-workflows: (unlike bash:), so those 6 unused tools cannot be individually removed without upstream changes to gh-aw.
Generated by Daily Copilot Token Optimization Advisor · ● 1.9M · ◷
Target Workflow:
pelis-agent-factory-advisorSource report: #1676 (2026-04-05 — most recent report with token data)
Estimated cost per run: $2.02
Total tokens per run: ~1,399K
Cache hit rate: 45.2% (0% on turn 1, ~48% on turns 6–11)
LLM turns: 11
Model: claude-sonnet-4.6 (via Copilot endpoint)
Current Configuration
agenticworkflowstoolsgithubtools (auto-included)githubtoolsets loadedcontext, repos, issues, pull_requestssafeoutputstoolsagenticworkflows:status,agenticworkflows:audit,bash:cat,cache-memory,safeoutputs:create_discussiongithub.github.ioonlyContext growth pattern (from run #23993514169):
Turn 2's +19K jump indicates sequential file reads (one per turn) driving early context bloat.
Recommendations
1. Restrict GitHub MCP toolset to
contextonlyEstimated savings: ~77K tokens/run (~5.5%), ~$0.11/run
The workflow's prompt uses zero GitHub MCP tools. All analysis goes through
agentic-workflowsand pre-computed files. Yet the GitHub MCP server auto-loads 25 tools (repos, issues, pull_requests toolsets) because those permissions are declared.Add an explicit
github:block to thetools:section to cap the toolset:This changes
GITHUB_TOOLSETSin the compiled lock file fromcontext,repos,issues,pull_requests(25 tools) to justcontext(~2 tools), removing ~23 unnecessary tool schemas.Savings breakdown: ~23 tools × ~350 tokens/schema = ~8,050 tokens/turn × 11 turns = ~88,550 tokens saved. At current I/O ratio, this maps to ~$0.12/run.
2. Add parallel tool-call instruction to reduce sequential file reads
Estimated savings: ~60K tokens/run (~4.3%), ~$0.09/run
Turn 2 adds +19K tokens — the biggest single-turn jump — from reading pre-fetched doc files one at a time. The agent makes sequential
catcalls rather than batching them. Adding an explicit batching instruction collapses turns 2–4 (file reads) into turn 1, eliminating ~2–3 early API calls.Add the following to the top of Phase 1 in the prompt body:
And add a similar note at the start of the prompt:
Savings breakdown: Each eliminated early turn saves the system prompt being re-sent (~40K tokens). Eliminating 2–3 turns ≈ 80–120K input tokens. Net estimate after accounting for batch overhead: ~60K tokens saved.
3. Condense the Phase 4 output template
Estimated savings: ~11K tokens/run (~0.8%), ~$0.02/run
Phase 4 spans 103 lines and contains a detailed discussion output template with repeated placeholder blocks (
(List P0 items),(List P1 items), etc.). This template is repeated in the context across all 11 turns.Replace the current Phase 4 section with a compact version (~50 lines):
Savings breakdown: Reducing from ~2,200 prompt tokens to ~900 tokens saves ~1,300 tokens/turn × 11 turns = ~14,300 tokens saved.
Expected Impact
Turn 1 savings are largest: no repos/issues/PR tool schemas in the cold-start API call.
Implementation Checklist
github: toolsets: [context]to thetools:section in.github/workflows/pelis-agent-factory-advisor.mdagentic-workflows status+auditparallel-call instructiongh aw compile .github/workflows/pelis-agent-factory-advisor.mdnpx tsx scripts/ci/postprocess-smoke-workflows.tsworkflow_dispatchand compare token-usage.jsonlNotes
cache_write_tokensis 0 (not reported separately by this endpoint). The cold-start on turn 1 is inherent to the session-based architecture. No action needed.agenticworkflowsserver loads 8 tools but onlystatusandauditare referenced in the prompt. There is no per-tool allowlist syntax foragentic-workflows:(unlikebash:), so those 6 unused tools cannot be individually removed without upstream changes to gh-aw.