feat: graph-aware ranking via call-graph centrality (+15% MRR) by justrach · Pull Request #523 · justrach/codedb

justrach · 2026-06-01T16:57:20Z

Summary

The graphify-informed precision work: a deterministic, LLM-free resolved call graph whose centrality feeds an additive ranking boost. Two commits — the reusable foundation + the index/ranking integration.

Foundation — `src/codegraph.zig`

extractCallees(body) — walks a function body for call sites (ident(), filters cross-language keywords/control-flow, dedups.
buildEdges(funcs, resolve) — resolves callee names → weighted edges (ambiguous names split 1/N, graphify's confidence idea).
inDegreeCentrality(edges) — weighted "who's called most" (graphify's "god node" signal).
Unit-tested in isolation.

Integration — `Explorer` / `searchContentRanked`

On first ranked search, ensureCallCentrality builds a per-file centrality map once (mutex-guarded, idempotent): resolve each function's call sites through the function symbol table, accumulate weighted in-degree per callee, aggregate per file. searchContentRanked multiplies each candidate's score by 1 + 0.15·log(1+centrality).

Always additive, never a filter — a misresolved edge can only nudge a heavily-called file up, never drop a real result. On by default; CODEDB_NO_CENTRALITY disables.

MRR-gated (kept only because it lifts)

Codedb repo, 18 labeled multi-word queries, A/B via the env toggle:

	MRR	P@1	recall@5
centrality off	0.819	12/18	18/18
centrality on	0.944	16/18	18/18

+0.125 MRR (+15%), +4 P@1, no recall loss — 4 queries' correct file jumped to rank 1, zero regressed. Reproduced on a clean cold index.

zig build test → 673/673 (adds codegraph unit tests; existing ranked-search tests unaffected — tiny corpora → ~zero centrality).

Follow-up

Persist centrality in the snapshot to remove the one-time first-query build cost on large repos.

…lity) Phase 1 of the graphify-informed precision work: deterministic call-site extraction, name resolution into weighted edges, in-degree centrality. Isolated + tested (673/673); not yet wired into ranking. Paused for the mcpsync incident; resume to wire centrality into searchContentRanked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…entRanked Wires the codegraph foundation into the real index + ranking (Phase 1b of the graphify-informed precision work). On first ranked search, builds a per-file "call centrality" map once: resolve each function's call sites (codegraph extractCallees) through the function symbol table, accumulate weighted in-degree per callee, aggregate per file. searchContentRanked multiplies each candidate's score by 1 + 0.15*log(1+centrality) — an ADDITIVE boost, never a filter, so a misresolved edge can only nudge a heavily-called (central) file up, never drop a real result. Build is mutex-guarded + idempotent; reads under the existing shared lock. On by default; CODEDB_NO_CENTRALITY disables. MRR-gated on the codedb repo (18 labeled multi-word queries, A/B via the env toggle): MRR 0.819 -> 0.944 (+0.125), P@1 12 -> 16, recall@5 18/18 unchanged; 4 queries' correct file jumped to rank 1, zero regressed. 673/673 tests pass. Follow-up: persist centrality in the snapshot to remove the one-time first-query build cost on large repos. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 542590dcc5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T16:59:43Z

+        // Aggregate weighted in-degree per file. Keys borrow stable outlines keys.
+        var cmap = std.StringHashMap(f32).init(self.allocator);
+        for (node_path.items, in_degree) |path, deg| {
+            if (deg == 0) continue;
+            const gop = cmap.getOrPut(path) catch continue;


Invalidate centrality when indexed files change

Because this cache stores borrowed outlines keys and is built only once, an incremental update after the first ranked search leaves call_centrality stale; in the delete case removeFile frees the same stable path slice, so later centralityBoost lookups can probe a StringHashMap containing dangling keys in a long-running watcher/MCP process. Please clear/rebuild this map whenever commitParsedFileOwnedOutline, removeFile, or word-index replacement changes the indexed file set.

Useful? React with 👍 / 👎.

github-actions · 2026-06-01T17:00:16Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	88012	86820	-1.35%	-1192	OK
`codedb_changes`	11128	12663	+13.79%	+1535	NOISE
`codedb_context`	1041126	1065609	+2.35%	+24483	OK
`codedb_deps`	231	263	+13.85%	+32	NOISE
`codedb_edit`	50199	52621	+4.82%	+2422	OK
`codedb_find`	9181	12474	+35.87%	+3293	NOISE
`codedb_hot`	24995	24835	-0.64%	-160	OK
`codedb_outline`	28919	29688	+2.66%	+769	OK
`codedb_read`	14538	14357	-1.25%	-181	OK
`codedb_search`	22137	20760	-6.22%	-1377	OK
`codedb_snapshot`	58974	66742	+13.17%	+7768	NOISE
`codedb_status`	9232	10638	+15.23%	+1406	NOISE
`codedb_symbol`	18352	17866	-2.65%	-486	OK
`codedb_tree`	38600	39022	+1.09%	+422	OK
`codedb_word`	15420	11952	-22.49%	-3468	OK

justrach and others added 2 commits June 2, 2026 00:34

justrach merged commit f41b7ec into release/0.2.5824 Jun 1, 2026
1 check passed

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

justrach mentioned this pull request Jun 3, 2026

release: 0.2.5824 → main (perf overhaul + warm-CLI daemon + faster find + graph ranking + Windsurf/Devin) #527

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: graph-aware ranking via call-graph centrality (+15% MRR)#523

feat: graph-aware ranking via call-graph centrality (+15% MRR)#523
justrach merged 2 commits into
release/0.2.5824from
feat/code-graph

justrach commented Jun 1, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented Jun 1, 2026

Summary

Foundation — src/codegraph.zig

Integration — Explorer / searchContentRanked

MRR-gated (kept only because it lifts)

Follow-up

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 1, 2026

Benchmark Regression Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Foundation — `src/codegraph.zig`

Integration — `Explorer` / `searchContentRanked`