feat: analyzer-embed crate + slop-fixes/slop-targets queries (semantic deslop targeting)

## Goal

Give deslop a structured input so it spends LLM tokens only where slop is likely. Two outputs:

1. **`query slop-fixes`** — pinpoint structured fix actions (Haiku-tier: confirm shape, apply). AST/graph signals only, no embedder needed.
2. **`query slop-targets`** — ranked file/module targets with `tier` field (sonnet=file-level, opus=cross-file/module). Uses embeddings if present, degrades gracefully to AST/graph signals.

## New crate: `analyzer-embed`

Separate Cargo crate in this workspace, separate binary `agent-analyzer-embed`. Heavy deps (`ort`, `tokenizers`) isolated here; main `agent-analyzer` binary stays small.

**Two model variants**, picked at install time (skill prompts user):

| Variant | Model | Size | Use case |
|---|---|---|---|
| small | BAAI/bge-small-en-v1.5 Q8 | ~30 MB | English-only, weaker on code, cheap |
| big | google/embeddinggemma-300m Q4 (ONNX) | ~195 MB | SOTA <500M params, code-aware, multilingual, Matryoshka |

Released as 5 platforms × 2 sizes = 10 release assets. Model file downloaded + cached next to binary, content-hashed.

**Granularity** picked at install time:
- compact: per-file × 128 dim
- balanced: per-function × 256 dim
- maximum: per-function × 768 dim

**Storage**: sidecar `.claude/repo-intel.embeddings.bin` (packed int8/fp16). Main `repo-intel.json` stays diffable.

**Subcommands on agent-analyzer-embed:**
- `scan` — full re-embed of all files, JSON to stdout
- `update` — delta-only: read existing sidecar, hash files, re-embed changed/added, drop removed
- `version`

## New subcommands on main `agent-analyzer` binary

- `set-embeddings --input -` — accept JSON via stdin, merge into sidecar (mirrors existing `set-descriptors` / `set-summary` pattern)

## New queries

### `query slop-fixes`

Returns structured fix actions for Haiku to apply:

\`\`\`json
{
  \"fixes\": [
    {\"action\": \"delete-file\", \"path\": \"debug.log\", \"reason\": \"tracked log artifact\"},
    {\"action\": \"delete-lines\", \"path\": \"src/auth.ts\", \"lines\": [142, 158], \"reason\": \"orphan export legacyHash — 0 importers\"},
    {\"action\": \"delete-lines\", \"path\": \"src/api.ts\", \"lines\": [88, 90], \"reason\": \"empty catch block\"},
    {\"action\": \"remove-dep\", \"manifest\": \"package.json\", \"name\": \"lodash.merge\", \"reason\": \"no import resolves\"},
    {\"action\": \"replace-lines\", \"path\": \"tests/auth.test.ts\", \"lines\": [12, 14], \"with\": \"\", \"reason\": \"tautological assertion\"}
  ]
}
\`\`\`

Categories (all AST/graph, no embedder needed):
- Orphan exports (symbol exported, 0 importers in graph)
- Orphan files (not imported, not entry-point, not test)
- Tracked artifacts (`.log`, `.bak`, `.orig`, `.DS_Store`, `coverage/`, `dist/` in git)
- Unused deps (declared in manifest, no import resolves)
- Empty catch blocks (AST shape)
- Tautological tests (AST: `expect(x).toBe(x)`, `assert!(true)`)
- Orphan snapshots/fixtures (file exists, no test references it)
- Duplicate constants (same literal/string defined N+ places)
- Long-skipped tests (`it.skip` / `#[ignore]` + git age >180d)
- Old TODOs (`TODO/FIXME/XXX/HACK` regex + git blame age >180d)
- Old `@ts-ignore` / `#[allow]` / `eslint-disable` (regex + age)
- Stale CI configs (`.travis.yml` when `.github/workflows/` exists)
- Two-of-same tooling (eslint + biome, prettier + biome, multiple lockfiles)

### `query slop-targets`

Returns ranked targets for Sonnet/Opus models:

\`\`\`json
{
  \"targets\": [
    {\"path\": \"src/worker.ts\", \"tier\": \"sonnet\", \"score\": 8.7, \"suspect\": \"defensive-cargo-cult\", \"why\": \"hotspot 2 + bugspot 5 + 1.6x comment density\"},
    {\"area\": \"src/auth/\", \"tier\": \"opus\", \"score\": 9.1, \"suspect\": \"over-abstraction\", \"why\": \"4-deep wrapper chain, single impl per layer\"}
  ]
}
\`\`\`

Sonnet tier uses file-level signals (combine hotspot + bugspot + size anomaly + comment density + bot authorship + recent big-drop).

Opus tier requires new graph traversals (see below).

## Opus-tier graph traversals (v1 scope)

New work in `analyzer-graph`:

- **Single-impl chains**: trait/interface with exactly one concrete implementor across the import graph
- **Wrapper towers**: chains where every node has fan-in 1 + fan-out 1 (A→B→C all pass-through)
- **Duplicate logic**: AST-shape isomorphism + (if embedder present) embedding similarity for semantic duplicates
- **Cliché-name clusters**: `helper` / `utility` / `manager` etc. flagged when clustered, not just present

## NLP-enabled patterns (require embedder)

Layered on when embedder is installed; degrade silently when not:

- **Comment-restates-code**: embedding similarity between comment text and next N tokens of code > threshold
- **History-prose detection in docs**: classifier on past-tense + change-reference patterns
- **Stylometry-based AI authorship**: per-file embedding distance from repo's own human-authored baseline (replaces removed metadata-based `aiAttribution`)
- **Semantic duplicates**: function-level embedding similarity (catches what AST-shape match misses)
- **Doc-drift v2**: README prose section ↔ function semantics similarity

## Schema additions (backward compat)

All new fields use `#[serde(skip_serializing_if = \"Option::is_none\", default)]`:

\`\`\`rust
pub struct RepoIntelData {
    // ... existing fields
    #[serde(skip_serializing_if = \"Option::is_none\", default)]
    pub embeddings_meta: Option<EmbeddingsMeta>,  // model id, dim, granularity
    // raw vectors live in sidecar .embeddings.bin
}
\`\`\`

## Why not bundle the embedder

Default install stays ~10MB. Power users opt into ~30MB or ~195MB explicitly. Architectural symmetry with main binary download flow: skill prompts → JS wrapper downloads → cached locally.

## Acceptance criteria

- [ ] `analyzer-embed` crate compiles, produces standalone binary
- [ ] Both model variants (small/big) load + embed successfully
- [ ] `agent-analyzer-embed scan` and `update` produce well-formed JSON
- [ ] `agent-analyzer set-embeddings` merges JSON into sidecar
- [ ] Sidecar persists across `init` / `update` runs
- [ ] `query slop-fixes` returns valid fix actions across all categories above
- [ ] `query slop-targets` returns ranked targets with `tier` and `why`
- [ ] Opus-tier graph traversals: single-impl chains, wrapper towers, duplicate logic
- [ ] All new fields backward-compatible (load existing v0.5.0 artifacts without error)
- [ ] Tests for: empty embeddings (no embedder), partial embeddings (delta update), full embeddings
- [ ] CI builds 10 release assets (5 platforms × 2 sizes)

## Companion work

Skill orchestration + UI prompts: agent-sh/repo-intel#TBD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: analyzer-embed crate + slop-fixes/slop-targets queries (semantic deslop targeting) #27

Goal

New crate: `analyzer-embed`

New subcommands on main `agent-analyzer` binary

New queries

`query slop-fixes`

`query slop-targets`

Opus-tier graph traversals (v1 scope)

NLP-enabled patterns (require embedder)

Schema additions (backward compat)

Why not bundle the embedder

Acceptance criteria

Companion work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Variant	Model	Size	Use case
small	BAAI/bge-small-en-v1.5 Q8	~30 MB	English-only, weaker on code, cheap
big	google/embeddinggemma-300m Q4 (ONNX)	~195 MB	SOTA <500M params, code-aware, multilingual, Matryoshka

feat: analyzer-embed crate + slop-fixes/slop-targets queries (semantic deslop targeting) #27

Description

Goal

New crate: analyzer-embed

New subcommands on main agent-analyzer binary

New queries

query slop-fixes

query slop-targets

Opus-tier graph traversals (v1 scope)

NLP-enabled patterns (require embedder)

Schema additions (backward compat)

Why not bundle the embedder

Acceptance criteria

Companion work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

New crate: `analyzer-embed`

New subcommands on main `agent-analyzer` binary

`query slop-fixes`

`query slop-targets`