Driver
Many framework tasks today route through Claude that don't need a strong model — issue triage, conventional-commit type suggestion, AgDR slug generation, branch-name suggestion from issue title, inbox/PR-list summary paragraphs, glossary-entry stubs, language detection on ambiguous filenames. A small local LLM (Llama 3 8B / Mistral / Phi via Ollama) or a purpose-built classifier could handle these in <1s with zero token cost.
This ticket is a research spike — measurement first, recommendation second, implementation in follow-ups. Same shape as the LSP spike (#178 → PR #184).
Hypothesis
Routing specific bounded tasks (issue classify, commit-type suggest, inbox summary) through a local model instead of Claude reduces token cost ≥10× on those tasks, with quality "good enough" for the framework's purpose, at acceptable latency.
Phases
Phase 1 — measurement
Three representative tasks, three variants each:
- Issue classification — given an issue title + first paragraph, classify as bug / feature / spike / chore. 20 examples from the apexyard portfolio.
- Commit-type suggestion — given a diff, suggest a conventional-commit type (feat / fix / refactor / docs / chore / test). 20 diffs from recent merges.
- Inbox summary — given a list of 10 PR titles, produce a 3-line "what's pending" paragraph. 10 inbox snapshots.
Variants:
- A. Pure tool: regex / template / heuristic. No model.
- B. Local LLM: Ollama running Llama-3-8B-Instruct (or comparable). Same prompts as Variant C.
- C. Claude (today's path): same prompts. Use Claude Haiku for the cheapest comparison; Sonnet for quality reference.
For each (task × variant), measure:
- Token cost (input + output)
- Wall-clock latency (cold + warm)
- Output quality — score against a hand-graded reference. Inter-rater spread acceptable; aim for ≥85% match on the dominant class.
- Setup cost (one-time install + maintenance)
Document in docs/spikes/local-model-routing.md.
Phase 2 — integration mechanism
Investigate how a local model would plug into apexyard skills:
- Tool surface: a new
OllamaCall tool? An MCP server wrapping Ollama? A bash helper invoked by skills?
- Skill opt-in: skills declare which sub-tasks they'd route to local. e.g.
/feature opts into "local model for issue-class suggestion before showing the user".
- Fallback: if local model is unavailable (Ollama not installed, model not pulled), skill falls back to Claude — no new failure mode.
- Privacy story: local stays on-machine. Claim the privacy benefit clearly in adopter docs.
Phase 3 — recommendation
Rank the three candidate sub-tasks (issue classify, commit-type suggest, inbox summary) by token-savings × adopter-frequency. Recommend which to migrate first. Sketch the migration shape (skill-side opt-in flag, fallback behavior, install steps).
Phase 4 — AgDR sketch
Y-statement + option matrix. The AgDR proper lands as a follow-up if the spike says "go".
Acceptance Criteria
Risks / Dependencies
- Local install adoption — adopters who don't install Ollama see no benefit. Acceptable: opt-in, documented, fallback to Claude.
- Quality drop — Llama-3-8B is "good enough" for short bounded tasks but not for synthesis. Phase 1 measurement validates per-task before recommending migration.
- Prompt tuning — local models often need different prompt shapes. Spike includes one round of prompt tuning per task; if the bar isn't met after that, the recommendation is "pure tool" or "stay on Claude" for that task.
- Cost-of-install ≥ cost-of-use — for low-frequency tasks (e.g. AgDR slug generation, twice a week), local install overhead may not pay off. Phase 3 should weight by frequency.
Out of scope
- Implementing the migration. The spike outputs measurement + recommendation; implementation lands in follow-up tickets.
- Custom training / fine-tuning. Off-the-shelf Llama-3 / Mistral / Phi only.
- Replacing Claude framework-wide. The intent is to route specific sub-tasks; Claude stays the dispatcher and synthesis layer.
Refs
Driver
Many framework tasks today route through Claude that don't need a strong model — issue triage, conventional-commit type suggestion, AgDR slug generation, branch-name suggestion from issue title, inbox/PR-list summary paragraphs, glossary-entry stubs, language detection on ambiguous filenames. A small local LLM (Llama 3 8B / Mistral / Phi via Ollama) or a purpose-built classifier could handle these in <1s with zero token cost.
This ticket is a research spike — measurement first, recommendation second, implementation in follow-ups. Same shape as the LSP spike (#178 → PR #184).
Hypothesis
Routing specific bounded tasks (issue classify, commit-type suggest, inbox summary) through a local model instead of Claude reduces token cost ≥10× on those tasks, with quality "good enough" for the framework's purpose, at acceptable latency.
Phases
Phase 1 — measurement
Three representative tasks, three variants each:
Variants:
For each (task × variant), measure:
Document in
docs/spikes/local-model-routing.md.Phase 2 — integration mechanism
Investigate how a local model would plug into apexyard skills:
OllamaCalltool? An MCP server wrapping Ollama? A bash helper invoked by skills?/featureopts into "local model for issue-class suggestion before showing the user".Phase 3 — recommendation
Rank the three candidate sub-tasks (issue classify, commit-type suggest, inbox summary) by token-savings × adopter-frequency. Recommend which to migrate first. Sketch the migration shape (skill-side opt-in flag, fallback behavior, install steps).
Phase 4 — AgDR sketch
Y-statement + option matrix. The AgDR proper lands as a follow-up if the spike says "go".
Acceptance Criteria
docs/spikes/local-model-routing.mdexists with methodology + raw numbers for all 3 task variants × 3 model variants.Risks / Dependencies
Out of scope
Refs
parallel-work.mdrule (we sanction fan-out; routing to local is similar — explicit framework support for cheaper alternatives)._lib-detect-bash-write.sh,validate-pr-create.sh) — that's the "pure tool" baseline this spike compares against.