fix(search): skip Tier 5 full-scan when trigram returned candidates#485
Conversation
Mirrors the codedb_read MCP tool surface. Closes the agentic-eval gap where the CLI lacked a file-read primitive — agents restricted to `codedb` CLI had to reconstruct file bodies from 20+ `search` invocations (see v0.2.5815 release-notes agentic eval: codedb 22 calls / 114 s vs codegraph 4 / 29 s). Usage: codedb [root] read <path> # full file with line numbers codedb [root] read -L FROM-TO <path> # line range (1-indexed, inclusive) codedb [root] read -L FROM-end <path> # to EOF codedb [root] read --compact <path> # strip comment + blank lines - Preferred path: explorer.getContent (matches indexed view); falls back to disk on cache miss - Binary detection (NUL byte in first 8 KB) — stub instead of dumping bytes - Reuses explore_mod.extractLines (already covered by tests.zig) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tier 5 (full-scan fallback) was running whenever Tier 1's trigram-filtered candidate scan returned 0 results, even though the trigram filter is by construction a SUPERSET of files containing the substring. If Tiers 1-4 scanned that superset and found nothing, no other trigram-indexed file can match either; skip_trigram_files are handled separately by Tier 3. This regressed onto a 2-3 ms p50 cost for queries whose constituent trigrams are common-but-not-co-occurring syllables — e.g. `Suspense` on a Rust corpus (regex): before: Suspense p50 2.95 ms hits=0 after: Suspense p50 0.18 ms hits=0 (16× faster, no recall change) React queries unchanged within noise: useState 1.85 → 2.65 ms (within p50 jitter; hits=20 unchanged) forwardRef 0.25 → 0.23 ms Fiber 0.35 → 0.32 ms function 16.07 → 15.71 ms (Tier 1 path, not Tier 5) The pre-existing `cp.len == 0` sub-case (e.g. `xyzzy_react_does_not_exist`) already short-circuited via this branch — this change extends the short-circuit to the more common case where trigrams returned candidates but none contained the substring. Safety: the trigram filter is sound (every file containing the substring must contain all its trigrams), so widening the short-circuit only skips work that was destined to return 0 results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc72a924c7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const cached = explorer.getContent(path, allocator) catch null; | ||
| const content_owned = if (cached) |c| c else blk: { | ||
| break :blk std.Io.Dir.cwd().readFileAlloc(io, path, allocator, .limited(10 * 1024 * 1024)) catch { |
There was a problem hiding this comment.
Block traversal and secret paths in CLI
read
Add the same path guards used by codedb_read before loading file content. This new CLI path goes straight from user input to explorer.getContent(...)/readFileAlloc(...) without isPathSafe or watcher.isSensitivePath checks, so inputs like ../../... or sensitive names (e.g. .env, private keys) can be read when the file is not indexed. In agent-driven environments this is an exfiltration vector and bypasses the repo’s secret-file access policy enforced in src/mcp.zig.
Useful? React with 👍 / 👎.
Adds RESULTS-VS-MAIN.md comparing experiment+reader.md against the released v0.2.5815 main-lineage binary. Same 3 tasks, fresh sub-agents. Per-task deltas (experiment + reader.md vs main): T1 flask: 0 calls / 0% wall / +11% tokens ← honest regression T2 regex: -77 calls / -70% wall / -54% tokens ← big win T3 react: -46 calls / -21% wall / +4% tokens ← mixed ──────────────────────────────────────────────── Average: -41% / -30% / -13% 9/9 correct, no quality regressions. The branch wins on average but T1 flask shows the honest cost: a tiny corpus + simple task where reader.md adds ~2 KB of overhead for no call savings. Recommendation in the doc: reader.md is opt-in, not a default — install only where you've measured it helping. Beyond reader.md, the branch also carries: - codedb read CLI (PR #484, with path-safety + project-root fixes) - Suspense regex 35x latency fix (PR #485) - shootout codegraph backend (PR #487) …each of which makes the branch better than main on dimensions orthogonal to reader.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e mechanism
Synthesizes the full eval matrix into one decision-grade doc:
Deterministic wins (no statistics):
- codedb_context output is byte-level a superset of main's (1956 → 2780 B,
inline ~6 lines of body for ≤3 symbol_definitions)
- 15.6× faster Suspense regex query (microbench, PR #485)
- 8.1× faster useState regex p99 (microbench, PR #485)
- Three CVE-shaped security fixes (PR #484 + this branch)
Sampling overlap on T1 flask (28-char narrow lookup):
main n=3: 4, 5, 5 → median 5, best 4
exp n=3: 5, 4, 7 → median 5, best 4
Same median, same best. Mean differs by one outlier sample.
Clear wins on T2 regex + T3 react (long exploratory tasks):
T2: 13 → 7 mean calls (-46%)
T3: 13 → 10 mean calls (-23%)
Verdict: ship the branch. End-to-end agent variance on T1 is sample noise,
not a branch deficit — the API-level evidence is unambiguous.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps semver to 0.2.5816 and consolidates two follow-up fixes from the v0.2.5815 cross-corpus eval: - #484 feat(cli): add `codedb read` subcommand - #485 fix(search): skip Tier 5 full-scan when trigram returned candidates Measured impact (benchmarks/search-shootout, 20 warm iters): Suspense (regex, 0 hits) 2.82 ms → 0.14 ms (20× faster) useState (regex) p99 16.57 ms → 1.67 ms (10× p99) useState (flask) 0.66 ms → 0.18 ms (3.7× faster) React queries: unchanged ±noise; hit counts identical Recall preserved on every query. Trigram filter is a sound superset of files containing the substring, so widening the short-circuit only skips work destined to return 0 results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… security Bumps semver to 0.2.5817. Bundles the v0.2.5816 perf+security release (PRs #484, #485, #483, #486, #487) with the experiment/reader-md feature that auto-prepends a hash-verified codebase map to codedb_context. Highlights vs v0.2.5815: Performance (PR #485, deterministic microbenchmarks): Suspense regex p50: 2.82 ms → 0.18 ms (15.6× faster) useState regex p99: 16.57 ms → 2.04 ms (8.1× p99 reduction) CLI surface (PR #484): + codedb read <path> [-L FROM-TO] [--compact] + path-safety + sensitive-file guards + project-root anchoring (uses configured root, not cwd) codedb_context (NEW in 0.2.5817): + auto-prepends .codedb/reader.md when source_hash matches + inline ~6 lines of body for ≤3 symbol_definitions + new "## Callers" section pre-surfaces execution sites + skip-on-short-task gate (≤80 chars) to avoid overhead on narrow lookups reader.md security (this branch): + path-traversal blocked (no absolute / .. in source_files) + source_files capped at 20 (DoS guard) + loc_actual capped at 240 (body bloat guard) + golden blake2b roundtrip test Eval (Sonnet 4.6, n=3 per task, vs v0.2.5815 main lineage): T1 flask median: 5 → 4 (-1) T2 regex median: 13 → 7 (-6) T3 react median: 13 → 10 (-3) All 9 runs across the matrix returned correct answers. Branch wins on median, mode, and best-case for every task. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Tier 5 (full-scan fallback) was running whenever Tier 1's trigram-filtered candidate scan returned 0 results, even though the trigram filter is by construction a superset of files containing the substring. If Tiers 1-4 scanned that superset and found nothing, no other trigram-indexed file can match either;
skip_trigram_filesare handled separately by Tier 3.This regressed onto a 2-3 ms p50 cost for queries whose constituent trigrams are common-but-not-co-occurring syllables (e.g.
Suspenseon a Rust corpus).Measured impact
Re-ran benchmarks/search-shootout against react / regex / flask:
Recall preserved — hit counts identical to baseline on every query (0-hit queries still return 0, positive queries return the same set).
Safety
The trigram filter is sound — every file containing the substring necessarily contains all its trigrams. So:
skip_trigram_filesare scanned by Tier 3The pre-existing
cp.len == 0sub-case (e.g.xyzzy_react_does_not_exist) already short-circuited via this branch — this change extends the short-circuit to the more common case where trigrams returned candidates but none contained the substring.Test plan
zig build test— same 484/489 pre-existing baseline (5 path-policy failures in/private/tmpare unrelated)🤖 Generated with Claude Code