Skip to content

feat: graph-aware ranking via call-graph centrality (+15% MRR)#523

Merged
justrach merged 2 commits into
release/0.2.5824from
feat/code-graph
Jun 1, 2026
Merged

feat: graph-aware ranking via call-graph centrality (+15% MRR)#523
justrach merged 2 commits into
release/0.2.5824from
feat/code-graph

Conversation

@justrach

@justrach justrach commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Summary

The graphify-informed precision work: a deterministic, LLM-free resolved call graph whose centrality feeds an additive ranking boost. Two commits — the reusable foundation + the index/ranking integration.

Foundation — src/codegraph.zig

  • extractCallees(body) — walks a function body for call sites (ident(), filters cross-language keywords/control-flow, dedups.
  • buildEdges(funcs, resolve) — resolves callee names → weighted edges (ambiguous names split 1/N, graphify's confidence idea).
  • inDegreeCentrality(edges) — weighted "who's called most" (graphify's "god node" signal).
  • Unit-tested in isolation.

Integration — Explorer / searchContentRanked

On first ranked search, ensureCallCentrality builds a per-file centrality map once (mutex-guarded, idempotent): resolve each function's call sites through the function symbol table, accumulate weighted in-degree per callee, aggregate per file. searchContentRanked multiplies each candidate's score by 1 + 0.15·log(1+centrality).

Always additive, never a filter — a misresolved edge can only nudge a heavily-called file up, never drop a real result. On by default; CODEDB_NO_CENTRALITY disables.

MRR-gated (kept only because it lifts)

Codedb repo, 18 labeled multi-word queries, A/B via the env toggle:

MRR P@1 recall@5
centrality off 0.819 12/18 18/18
centrality on 0.944 16/18 18/18

+0.125 MRR (+15%), +4 P@1, no recall loss — 4 queries' correct file jumped to rank 1, zero regressed. Reproduced on a clean cold index.

zig build test673/673 (adds codegraph unit tests; existing ranked-search tests unaffected — tiny corpora → ~zero centrality).

Follow-up

Persist centrality in the snapshot to remove the one-time first-query build cost on large repos.

justrach and others added 2 commits June 2, 2026 00:34
…lity)

Phase 1 of the graphify-informed precision work: deterministic call-site
extraction, name resolution into weighted edges, in-degree centrality.
Isolated + tested (673/673); not yet wired into ranking. Paused for the
mcpsync incident; resume to wire centrality into searchContentRanked.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…entRanked

Wires the codegraph foundation into the real index + ranking (Phase 1b of the
graphify-informed precision work). On first ranked search, builds a per-file
"call centrality" map once: resolve each function's call sites (codegraph
extractCallees) through the function symbol table, accumulate weighted in-degree
per callee, aggregate per file. searchContentRanked multiplies each candidate's
score by 1 + 0.15*log(1+centrality) — an ADDITIVE boost, never a filter, so a
misresolved edge can only nudge a heavily-called (central) file up, never drop a
real result. Build is mutex-guarded + idempotent; reads under the existing
shared lock.

On by default; CODEDB_NO_CENTRALITY disables. MRR-gated on the codedb repo
(18 labeled multi-word queries, A/B via the env toggle):
  MRR 0.819 -> 0.944 (+0.125), P@1 12 -> 16, recall@5 18/18 unchanged;
  4 queries' correct file jumped to rank 1, zero regressed.
673/673 tests pass.

Follow-up: persist centrality in the snapshot to remove the one-time
first-query build cost on large repos.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@justrach justrach merged commit f41b7ec into release/0.2.5824 Jun 1, 2026
1 check passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 542590dcc5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/explore.zig
Comment on lines +2601 to +2605
// Aggregate weighted in-degree per file. Keys borrow stable outlines keys.
var cmap = std.StringHashMap(f32).init(self.allocator);
for (node_path.items, in_degree) |path, deg| {
if (deg == 0) continue;
const gop = cmap.getOrPut(path) catch continue;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Invalidate centrality when indexed files change

Because this cache stores borrowed outlines keys and is built only once, an incremental update after the first ranked search leaves call_centrality stale; in the delete case removeFile frees the same stable path slice, so later centralityBoost lookups can probe a StringHashMap containing dangling keys in a long-running watcher/MCP process. Please clear/rebuild this map whenever commitParsedFileOwnedOutline, removeFile, or word-index replacement changes the indexed file set.

Useful? React with 👍 / 👎.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 88012 86820 -1.35% -1192 OK
codedb_changes 11128 12663 +13.79% +1535 NOISE
codedb_context 1041126 1065609 +2.35% +24483 OK
codedb_deps 231 263 +13.85% +32 NOISE
codedb_edit 50199 52621 +4.82% +2422 OK
codedb_find 9181 12474 +35.87% +3293 NOISE
codedb_hot 24995 24835 -0.64% -160 OK
codedb_outline 28919 29688 +2.66% +769 OK
codedb_read 14538 14357 -1.25% -181 OK
codedb_search 22137 20760 -6.22% -1377 OK
codedb_snapshot 58974 66742 +13.17% +7768 NOISE
codedb_status 9232 10638 +15.23% +1406 NOISE
codedb_symbol 18352 17866 -2.65% -486 OK
codedb_tree 38600 39022 +1.09% +422 OK
codedb_word 15420 11952 -22.49% -3468 OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant