Skip to content

⚡ Copilot Token Optimization2026-04-06 — Pelis Agent Factory Advisor #1716

@github-actions

Description

@github-actions

Target Workflow: pelis-agent-factory-advisor

Source report: #1676 (2026-04-05 — most recent report with token data)
Estimated cost per run: $2.02
Total tokens per run: ~1,399K
Cache hit rate: 45.2% (0% on turn 1, ~48% on turns 6–11)
LLM turns: 11
Model: claude-sonnet-4.6 (via Copilot endpoint)

Current Configuration

Setting Value
Tools loaded 36 total
agenticworkflows tools 8 (add, audit, compile, fix, logs, mcp-inspect, status, update)
github tools (auto-included) 25 (actions_get, actions_list, get_commit, get_file_contents, get_job_logs, get_label, get_latest_release, get_me, get_release_by_tag, get_tag, get_team_members, get_teams, issue_read, list_branches, list_commits, list_issue_types, list_issues, list_pull_requests, list_releases, list_tags, pull_request_read, search_code, search_issues, search_pull_requests, search_repositories)
github toolsets loaded context, repos, issues, pull_requests
safeoutputs tools 4 (create_discussion, missing_tool, noop, missing_data)
Tools actually used by prompt agenticworkflows:status, agenticworkflows:audit, bash:cat, cache-memory, safeoutputs:create_discussion
Pre-agent steps ✅ Yes — 4 steps (fetch Pelis docs, fetch agentics patterns, compute hash, collect repo structure)
Prompt size 8,897 bytes (~2,220 tokens of user instructions)
Network groups github.github.io only

Context growth pattern (from run #23993514169):

Turn Input Tokens Cache Read Delta
1 39,950 0 baseline
2 59,297 33,976 +19,347 (large tool result — docs read)
3 61,343 46,636 +2,046
4 69,222 53,989 +7,879 (another large read)
5–11 71K→83K 61K→78K +2–5K/turn

Turn 2's +19K jump indicates sequential file reads (one per turn) driving early context bloat.

Recommendations

1. Restrict GitHub MCP toolset to context only

Estimated savings: ~77K tokens/run (~5.5%), ~$0.11/run

The workflow's prompt uses zero GitHub MCP tools. All analysis goes through agentic-workflows and pre-computed files. Yet the GitHub MCP server auto-loads 25 tools (repos, issues, pull_requests toolsets) because those permissions are declared.

Add an explicit github: block to the tools: section to cap the toolset:

# .github/workflows/pelis-agent-factory-advisor.md — tools section
tools:
  agentic-workflows:
  bash:
    - "cat"
    - "find"
    - "ls"
    - "grep"
  cache-memory: true
  github:
    toolsets: [context]   # ← add this line

This changes GITHUB_TOOLSETS in the compiled lock file from context,repos,issues,pull_requests (25 tools) to just context (~2 tools), removing ~23 unnecessary tool schemas.

Savings breakdown: ~23 tools × ~350 tokens/schema = ~8,050 tokens/turn × 11 turns = ~88,550 tokens saved. At current I/O ratio, this maps to ~$0.12/run.


2. Add parallel tool-call instruction to reduce sequential file reads

Estimated savings: ~60K tokens/run (~4.3%), ~$0.09/run

Turn 2 adds +19K tokens — the biggest single-turn jump — from reading pre-fetched doc files one at a time. The agent makes sequential cat calls rather than batching them. Adding an explicit batching instruction collapses turns 2–4 (file reads) into turn 1, eliminating ~2–3 early API calls.

Add the following to the top of Phase 1 in the prompt body:

> **Efficiency note:** Read all required files in a **single parallel batch**
> call `bash:cat` for `.content-hash.txt`, `.pelis-agent-factory-docs.txt`,
> `.agentics-patterns.txt`, and `.repo-structure.txt` simultaneously in your
> first turn. Do not read them one at a time.

And add a similar note at the start of the prompt:

> **Parallel tool calls:** Always batch independent operations into a single
> turn. Read multiple files simultaneously. Call `agentic-workflows status`
> and `agentic-workflows audit` in the same turn.

Savings breakdown: Each eliminated early turn saves the system prompt being re-sent (~40K tokens). Eliminating 2–3 turns ≈ 80–120K input tokens. Net estimate after accounting for batch overhead: ~60K tokens saved.


3. Condense the Phase 4 output template

Estimated savings: ~11K tokens/run (~0.8%), ~$0.02/run

Phase 4 spans 103 lines and contains a detailed discussion output template with repeated placeholder blocks ((List P0 items), (List P1 items), etc.). This template is repeated in the context across all 11 turns.

Replace the current Phase 4 section with a compact version (~50 lines):

## Output Format

Create a discussion with these sections (use `create_discussion`):
1. **📊 Executive Summary** — 2–3 sentences on maturity and top opportunities
2. **🎓 Patterns Learned** — Key patterns from Pelis docs vs current repo
3. **📋 Workflow Inventory** — Table: `| Workflow | Purpose | Trigger | Assessment |`
4. **🚀 Recommendations** — Grouped by priority (P0–P3), each with: What / Why / How / Effort / Example
5. **📈 Maturity Assessment** — Current/Target level (1–5), gap analysis
6. **🔄 Best Practice Comparison** — What it does well, what to improve
7. **📝 Notes** — Update cache-memory with observations

Priority levels: P0=High impact+Low effort, P1=High impact+Medium effort, P2=Medium, P3=Nice-to-have.

Savings breakdown: Reducing from ~2,200 prompt tokens to ~900 tokens saves ~1,300 tokens/turn × 11 turns = ~14,300 tokens saved.


Expected Impact

Metric Current Projected Savings
Total tokens/run 1,399K ~1,248K ~151K (−11%)
Cost/run $2.02 ~$1.80 −$0.22 (−11%)
LLM turns 11 8–9 −2 to −3
GitHub tools loaded 25 ~2 −23 tools
Turn 1 input tokens 39,950 ~31,900 −8,050 (−20%)

Turn 1 savings are largest: no repos/issues/PR tool schemas in the cold-start API call.


Implementation Checklist

  • Add github: toolsets: [context] to the tools: section in .github/workflows/pelis-agent-factory-advisor.md
  • Add parallel-batch instruction block at the top of Phase 1 in the prompt body
  • Add agentic-workflows status + audit parallel-call instruction
  • Condense the Phase 4 output template from ~103 lines to ~50 lines
  • Recompile: gh aw compile .github/workflows/pelis-agent-factory-advisor.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Trigger a manual run via workflow_dispatch and compare token-usage.jsonl
  • Verify cost in the next token usage report (should drop from ~$2.02 toward ~$1.80)

Notes

  • Cache hit rate (45.2%) cannot be improved significantly with the Copilot provider — cache_write_tokens is 0 (not reported separately by this endpoint). The cold-start on turn 1 is inherent to the session-based architecture. No action needed.
  • Context growth (40K→83K over 11 turns) is driven by tool results accumulating in history. Recommendations 1 and 2 address the largest sources: smaller tool schemas and fewer turns.
  • The agenticworkflows server loads 8 tools but only status and audit are referenced in the prompt. There is no per-tool allowlist syntax for agentic-workflows: (unlike bash:), so those 6 unused tools cannot be individually removed without upstream changes to gh-aw.

Generated by Daily Copilot Token Optimization Advisor · ● 1.9M ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions