Driver
ApexYard skills that touch real project code (/handover's deep-dive, /code-review, security review, post-handover discovery tickets) currently rely on Read + Grep + Glob for code navigation. For semantic queries — "where is this symbol defined", "what calls this function", "what's the return type of this method" — grep is dramatically token-expensive: it returns N hits across M files, the agent then reads each file to disambiguate, and the cost compounds for every navigation step.
LSP (Language Server Protocol) servers solve exactly this: a single semantic-index query returns the definition, references, or type information directly, with one round-trip and ~100 tokens regardless of codebase size.
A second trigger: today's /handover reads project metadata via gh api repos/.../contents/... (no clone). The follow-up deep-dive discovery tickets that the adopter files post-handover need to read code — at which point cloning is required anyway. If we're going to clone for the deep dive, the marginal cost of running an LSP on the cloned tree is small, and the per-query token savings can be substantial.
This issue is a research spike, not an implementation. The goal is measurement + an AgDR with a go/no-go recommendation.
Hypothesis
Replacing grep-based code navigation with LSP queries on cloned projects will reduce token cost for code-aware skills (/code-review, /threat-model, /handover deep-dive) by an order of magnitude on representative tasks.
Scope of the spike
Phase 1 — measurement (the required output)
Pick three representative tasks the framework already runs:
/code-review on a 200-line PR in a TypeScript project
/threat-model STRIDE pass on a Node + DynamoDB API
/handover deep-dive (one of the post-handover discovery tickets — pick one that explicitly needed code reading, not just metadata)
For each task, run two variants:
- A. Today's flow — Read + Grep + Glob, no LSP, no pre-clone (or
gh api contents reads only).
- B. Clone-first + LSP flow — clone into
workspace/<name>/, bring up the language server, replace symbol/definition/reference queries with LSP calls.
Measure:
- Total token usage (input + output)
- Wall-clock time
- Quality of the output (does the deeper code awareness produce a better review / threat model / discovery doc?)
Document the methodology + raw results in a spike report at docs/spikes/lsp-token-savings.md.
Phase 2 — Claude Code integration mechanism
Investigate how Claude Code exposes LSP to skills/agents. Specifically:
- Are there built-in LSP plugins (the user reports seeing this claim — verify against current Claude Code docs)?
- If yes: what's the invocation surface? Tool calls? Hook context?
- If no: what's the integration path? MCP server wrapping LSP? External LSP daemon + custom tool?
Document findings + the recommended integration shape.
Phase 3 — /handover clone-first decision
Separate but related: if Phase 1 shows clear savings for code-aware work, should /handover change its default to clone-first?
Trade-offs:
- Pro: enables LSP + grep + Read on local files (cheaper than
gh api); makes the deep-dive discovery work cheap to start.
- Con: ~tens of MB disk per project; bandwidth on first clone; some adopters may not want every managed project cloned by default.
Possible answers:
- Always clone-first (simplest)
- Clone-first IF a deep-dive is part of this
/handover invocation (e.g. an --deep flag)
- Offer to clone interactively at the end of
/handover (today's pattern, but make the offer explicit)
The right answer depends on Phase 1 + Phase 2 results.
Phase 4 — AgDR (the decision artifact)
Write docs/agdr/AgDR-XXXX-lsp-integration.md capturing:
- Y-statement
- Options (do nothing; LSP for some skills; LSP framework-wide; clone-first as standard; etc.)
- Decision + reasoning
- Rollout plan (which skills first, what languages first, fallback when LSP unavailable)
Acceptance Criteria
Risks / Dependencies
- LSP-per-language overhead. Each language needs its own server (typescript-language-server, pyright, gopls, rust-analyzer, etc.). Multi-language portfolios have higher integration cost. The spike should call out language-coverage ramp.
- LSP startup latency. Bringing up a language server costs seconds. Per-skill-invocation startup is too slow; needs a daemon model or warm cache. Spike to investigate.
- Token-cost measurement is noisy. Same query can return different sized results depending on prompt phrasing. Use a consistent prompt template across A/B variants.
- Claude Code LSP integration may not exist as a first-class feature. Verify before assuming the integration path. If it doesn't exist, the spike's recommendation may be "wait until Claude Code ships this" rather than building a custom integration.
- Cross-project queries. LSP is per-project — finding all uses of pattern X across a portfolio still needs grep. Spike should clarify what LSP DOESN'T solve.
Out of scope
- Implementing LSP integration. The spike outputs a measurement report + AgDR + (if go) follow-up tickets. Implementation lands in those follow-ups.
- Cross-language polyglot projects (e.g. a TS frontend + Go backend in one repo). Spike picks single-language repos for measurement clarity.
- Replacing
Grep entirely. Even with LSP, full-text search across docs/comments/strings still wants grep — the spike is about replacing semantic-code queries, not all reads.
Refs
- Trigger: adopter report that LSP integration could substantially reduce token usage for code-aware AI agents (general industry observation; specific Claude Code claim should be verified in Phase 2).
- Related:
/handover SKILL.md (today's metadata-only reads via gh api); /code-review (today's grep-based navigation); the 12+ post-handover deep-dive discovery tickets that explicitly need code reading.
Driver
ApexYard skills that touch real project code (
/handover's deep-dive,/code-review, security review, post-handover discovery tickets) currently rely onRead+Grep+Globfor code navigation. For semantic queries — "where is this symbol defined", "what calls this function", "what's the return type of this method" — grep is dramatically token-expensive: it returns N hits across M files, the agent then reads each file to disambiguate, and the cost compounds for every navigation step.LSP (Language Server Protocol) servers solve exactly this: a single semantic-index query returns the definition, references, or type information directly, with one round-trip and ~100 tokens regardless of codebase size.
A second trigger: today's
/handoverreads project metadata viagh api repos/.../contents/...(no clone). The follow-up deep-dive discovery tickets that the adopter files post-handover need to read code — at which point cloning is required anyway. If we're going to clone for the deep dive, the marginal cost of running an LSP on the cloned tree is small, and the per-query token savings can be substantial.This issue is a research spike, not an implementation. The goal is measurement + an AgDR with a go/no-go recommendation.
Hypothesis
Replacing grep-based code navigation with LSP queries on cloned projects will reduce token cost for code-aware skills (
/code-review,/threat-model,/handoverdeep-dive) by an order of magnitude on representative tasks.Scope of the spike
Phase 1 — measurement (the required output)
Pick three representative tasks the framework already runs:
/code-reviewon a 200-line PR in a TypeScript project/threat-modelSTRIDE pass on a Node + DynamoDB API/handoverdeep-dive (one of the post-handover discovery tickets — pick one that explicitly needed code reading, not just metadata)For each task, run two variants:
gh api contentsreads only).workspace/<name>/, bring up the language server, replace symbol/definition/reference queries with LSP calls.Measure:
Document the methodology + raw results in a spike report at
docs/spikes/lsp-token-savings.md.Phase 2 — Claude Code integration mechanism
Investigate how Claude Code exposes LSP to skills/agents. Specifically:
Document findings + the recommended integration shape.
Phase 3 —
/handoverclone-first decisionSeparate but related: if Phase 1 shows clear savings for code-aware work, should
/handoverchange its default to clone-first?Trade-offs:
gh api); makes the deep-dive discovery work cheap to start.Possible answers:
/handoverinvocation (e.g. an--deepflag)/handover(today's pattern, but make the offer explicit)The right answer depends on Phase 1 + Phase 2 results.
Phase 4 — AgDR (the decision artifact)
Write
docs/agdr/AgDR-XXXX-lsp-integration.mdcapturing:Acceptance Criteria
docs/spikes/lsp-token-savings.mdexists with methodology + raw numbers for all 3 task variants.docs/spikes/lsp-token-savings.mddocuments Phase 2 findings — how Claude Code exposes (or doesn't) LSP integration.docs/spikes/lsp-token-savings.mdincludes a Phase 3 recommendation on/handoverclone-first.Risks / Dependencies
Out of scope
Grepentirely. Even with LSP, full-text search across docs/comments/strings still wants grep — the spike is about replacing semantic-code queries, not all reads.Refs
/handoverSKILL.md (today's metadata-only reads viagh api);/code-review(today's grep-based navigation); the 12+ post-handover deep-dive discovery tickets that explicitly need code reading.