[Spike] Measure token savings of LSP-based code navigation + clone-first /handover

## Driver

ApexYard skills that touch real project code (`/handover`'s deep-dive, `/code-review`, security review, post-handover discovery tickets) currently rely on `Read` + `Grep` + `Glob` for code navigation. For semantic queries — *"where is this symbol defined", "what calls this function", "what's the return type of this method"* — grep is dramatically token-expensive: it returns N hits across M files, the agent then reads each file to disambiguate, and the cost compounds for every navigation step.

LSP (Language Server Protocol) servers solve exactly this: a single semantic-index query returns the definition, references, or type information directly, with one round-trip and ~100 tokens regardless of codebase size.

A second trigger: today's `/handover` reads project metadata via `gh api repos/.../contents/...` (no clone). The follow-up deep-dive discovery tickets that the adopter files post-handover need to read code — at which point cloning is required anyway. If we're going to clone for the deep dive, the marginal cost of running an LSP on the cloned tree is small, and the per-query token savings can be substantial.

This issue is a **research spike**, not an implementation. The goal is measurement + an AgDR with a go/no-go recommendation.

## Hypothesis

Replacing grep-based code navigation with LSP queries on cloned projects will reduce token cost for code-aware skills (`/code-review`, `/threat-model`, `/handover` deep-dive) by an order of magnitude on representative tasks.

## Scope of the spike

### Phase 1 — measurement (the required output)

Pick three representative tasks the framework already runs:

1. `/code-review` on a 200-line PR in a TypeScript project
2. `/threat-model` STRIDE pass on a Node + DynamoDB API
3. `/handover` deep-dive (one of the post-handover discovery tickets — pick one that explicitly needed code reading, not just metadata)

For each task, run two variants:

- **A. Today's flow** — Read + Grep + Glob, no LSP, no pre-clone (or `gh api contents` reads only).
- **B. Clone-first + LSP flow** — clone into `workspace/<name>/`, bring up the language server, replace symbol/definition/reference queries with LSP calls.

Measure:

- Total token usage (input + output)
- Wall-clock time
- Quality of the output (does the deeper code awareness produce a better review / threat model / discovery doc?)

Document the methodology + raw results in a spike report at `docs/spikes/lsp-token-savings.md`.

### Phase 2 — Claude Code integration mechanism

Investigate how Claude Code exposes LSP to skills/agents. Specifically:

- Are there built-in LSP plugins (the user reports seeing this claim — verify against current Claude Code docs)?
- If yes: what's the invocation surface? Tool calls? Hook context?
- If no: what's the integration path? MCP server wrapping LSP? External LSP daemon + custom tool?

Document findings + the recommended integration shape.

### Phase 3 — `/handover` clone-first decision

Separate but related: if Phase 1 shows clear savings for code-aware work, should `/handover` change its default to clone-first?

Trade-offs:

- **Pro**: enables LSP + grep + Read on local files (cheaper than `gh api`); makes the deep-dive discovery work cheap to start.
- **Con**: ~tens of MB disk per project; bandwidth on first clone; some adopters may not want every managed project cloned by default.

Possible answers:

- Always clone-first (simplest)
- Clone-first IF a deep-dive is part of this `/handover` invocation (e.g. an `--deep` flag)
- Offer to clone interactively at the end of `/handover` (today's pattern, but make the offer explicit)

The right answer depends on Phase 1 + Phase 2 results.

### Phase 4 — AgDR (the decision artifact)

Write `docs/agdr/AgDR-XXXX-lsp-integration.md` capturing:

- Y-statement
- Options (do nothing; LSP for some skills; LSP framework-wide; clone-first as standard; etc.)
- Decision + reasoning
- Rollout plan (which skills first, what languages first, fallback when LSP unavailable)

## Acceptance Criteria

- [ ] `docs/spikes/lsp-token-savings.md` exists with methodology + raw numbers for all 3 task variants.
- [ ] `docs/spikes/lsp-token-savings.md` documents Phase 2 findings — how Claude Code exposes (or doesn't) LSP integration.
- [ ] `docs/spikes/lsp-token-savings.md` includes a Phase 3 recommendation on `/handover` clone-first.
- [ ] AgDR-XXXX written with go/no-go decision + rollout plan if go.
- [ ] If go: a follow-up ticket per recommended skill change (out of scope of this spike — spike is measurement + decision only).
- [ ] If no-go: AgDR documents the reasoning so this isn't re-investigated in 6 months.

## Risks / Dependencies

- **LSP-per-language overhead.** Each language needs its own server (typescript-language-server, pyright, gopls, rust-analyzer, etc.). Multi-language portfolios have higher integration cost. The spike should call out language-coverage ramp.
- **LSP startup latency.** Bringing up a language server costs seconds. Per-skill-invocation startup is too slow; needs a daemon model or warm cache. Spike to investigate.
- **Token-cost measurement is noisy.** Same query can return different sized results depending on prompt phrasing. Use a consistent prompt template across A/B variants.
- **Claude Code LSP integration may not exist as a first-class feature.** Verify before assuming the integration path. If it doesn't exist, the spike's recommendation may be "wait until Claude Code ships this" rather than building a custom integration.
- **Cross-project queries.** LSP is per-project — finding all uses of pattern X across a portfolio still needs grep. Spike should clarify what LSP DOESN'T solve.

## Out of scope

- Implementing LSP integration. The spike outputs a measurement report + AgDR + (if go) follow-up tickets. Implementation lands in those follow-ups.
- Cross-language polyglot projects (e.g. a TS frontend + Go backend in one repo). Spike picks single-language repos for measurement clarity.
- Replacing `Grep` entirely. Even with LSP, full-text search across docs/comments/strings still wants grep — the spike is about replacing semantic-code queries, not all reads.

## Refs

- Trigger: adopter report that LSP integration could substantially reduce token usage for code-aware AI agents (general industry observation; specific Claude Code claim should be verified in Phase 2).
- Related: `/handover` SKILL.md (today's metadata-only reads via `gh api`); `/code-review` (today's grep-based navigation); the 12+ post-handover deep-dive discovery tickets that explicitly need code reading.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spike] Measure token savings of LSP-based code navigation + clone-first /handover #178

Driver

Hypothesis

Scope of the spike

Phase 1 — measurement (the required output)

Phase 2 — Claude Code integration mechanism

Phase 3 — `/handover` clone-first decision

Phase 4 — AgDR (the decision artifact)

Acceptance Criteria

Risks / Dependencies

Out of scope

Refs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Spike] Measure token savings of LSP-based code navigation + clone-first /handover #178

Description

Driver

Hypothesis

Scope of the spike

Phase 1 — measurement (the required output)

Phase 2 — Claude Code integration mechanism

Phase 3 — /handover clone-first decision

Phase 4 — AgDR (the decision artifact)

Acceptance Criteria

Risks / Dependencies

Out of scope

Refs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Phase 3 — `/handover` clone-first decision