Enhancement: query-specific retrieval signals (call-graph distance to matched symbols; git co-change)

## Context

codedb's `search` is already multi-signal: `searchContentRanked` (src/explore.zig:3135) ranks with **BM25/BM25+** (lines ~3230-3260) × **call-graph centrality** (`centralityBoost`, line 3286; built in `ensureCallGraph`, ~2957) × **path/test/doc penalties** (`pathRelevanceMultiplier`, ~2906-2944), and `context` blends BM25 × symbol-definition boost (src/mcp.zig). So this is *not* "add structure to ranking" — it's two signals that are genuinely absent and that target the hard case: **disambiguating the target file among many same-keyword hits.**

**Motivation:** in an ALMA-style retrieval experiment (engram re-ranking codedb features), the retrieval ceiling on a large repo (openclaw, 13.6k files) sat at **~0.30 MRR** — a *signal* limit: when a query identifier matches many files, the current features (lexical, **global** centrality, name-match, degree) can't tell which file a change actually targets. Both missing signals below are **query-specific**, unlike codedb's current global signals.

### 1. Query-specific call-graph distance
codedb builds the call graph and uses it for **global** centrality (per-file importance) — but ranking has no **query-specific** graph signal: *how close (in call/import hops) is this candidate to the symbols the query matched?* `findCallPath` (src/explore.zig:3087) exists but only for navigation, not ranking. Folding "graph distance to matched symbols" into the score would prefer files structurally near the query's definitions over distant same-keyword files.

### 2. Git co-change
codedb is git-history-blind: it reads only the HEAD SHA (src/git.zig, for snapshot invalidation) and uses file **mtime** as its only temporal signal — no commit-log / co-change. "Files historically changed together" is a strong signal for *which file a task touches*. Parsing `git log` into a co-change graph adds a high-value, query-relevant signal. (Needs real history; shallow clones won't have it.)

### (minor) Richer file-role
Role handling today is binary doc/code (`isDocLanguage`, src/explore.zig:191) + a heuristic test penalty (0.6x, ~2921). A richer multi-class role (config / generated / test / impl) used in ranking could route queries (a "config" query → `.toml`/`.json`; a "runtime" query → impl).

## Why now
All three extend machinery codedb already has (call graph, BM25, path heuristics) and attack the disambiguation ceiling that BM25 + *global* centrality can't break on its own.

_Surfaced via an ALMA-style retrieval experiment over codedb features._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: query-specific retrieval signals (call-graph distance to matched symbols; git co-change) #550

Context

1. Query-specific call-graph distance

2. Git co-change

(minor) Richer file-role

Why now

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Enhancement: query-specific retrieval signals (call-graph distance to matched symbols; git co-change) #550

Description

Context

1. Query-specific call-graph distance

2. Git co-change

(minor) Richer file-role

Why now

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions