Skip to content

release: v0.2.579 — MCP deferred scan fix, joinable threads, root_policy hardening#350

Merged
justrach merged 59 commits into
mainfrom
release/0.2.579
Apr 30, 2026
Merged

release: v0.2.579 — MCP deferred scan fix, joinable threads, root_policy hardening#350
justrach merged 59 commits into
mainfrom
release/0.2.579

Conversation

@justrach

Copy link
Copy Markdown
Owner

v0.2.579 — MCP root resolution fix

Summary

Fixes a class of MCP crashes and misbehaviors when codedb mcp is launched from a process that sets cwd to / (Cursor, Windsurf, VS Code).

Closes #346 — MCP crashes when spawned from cwd=/
Closes #347roots/list handshake missing on deferred scan path
Closes #278 — MCP server returns no files when launched without explicit root

What changed

Core fix (PR #348): Deferred scan path

  • MCP server now defers the filesystem scan until it receives a roots/list response from the client
  • root_policy.zig resolves a valid root from the client-provided URIs with fallbacks; rejects / and other invalid system roots
  • DeferredScan struct coordinates the handshake across threads
  • idleWatchdog keeps the stdio transport alive while waiting for roots

Thread safety (PR #349): Codex review fixes

  • P1: scan thread stored in DeferredScan.scan_thread, joined on shutdown (was detached → use-after-free)
  • P1: watcherDeferredLoop now runs watcher.incrementalLoop after scan_done fires, removing the second detached thread spawn
  • P1: triggered reset to false if getDataDir fails, so next valid roots reply can retry
  • P2: codedb . mcp (explicit . arg) now scans immediately instead of entering deferred mode

Test coverage

  • New E2E harness: scripts/e2e_mcp_test.py — 17/17 pass
    • S1: issue-346 regression (spawn from /, full roots handshake)
    • S2: explicit --root mode (no roots/list, immediate scan)
    • S3: no-roots client (graceful idle, tools respond)
  • zig build test — all unit tests pass
  • CI benchmarks: no regressions

wiki.codes

codedb_remote now queries api.wiki.codes — hosted index of public GitHub repos.

Test plan

  • E2E 17/17
  • Unit tests pass
  • Binary notarized and codesigned
  • Benchmark CI: all OK or NOISE

Generated with Devin

justrach and others added 30 commits April 22, 2026 10:41
Port the legacy HTTP endpoint (stubbed in 56ea465 / v0.2.578) to the new
`std.Io.net` surface: bind 127.0.0.1 with `IpAddress.parse`/`listen`,
accept in a loop, and hand each stream to a detached thread. The routes
and JSON response shapes match the pre-0.16 implementation so existing
clients don't need changes.

Refs #307, #285

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
For `codedb mcp`, stdout is reserved for JSON-RPC messages. The root-policy
failure path wrote `✗ refusing to index temporary root: …` (and the normal
`✓ indexed` startup line) to stdout, which hosts reject with
`invalid character 'â' looking for beginning of value` on the leading UTF-8
byte of the status glyph.

Switch `out.file` to stderr once `cmd == "mcp"` is resolved, so every `out.p`
call on that path goes to stderr while stdout stays clean for protocol
messages.

Closes #304

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Read `CODEDB_PORT` from the environment; fall back to 7719 on absence
or parse failure. Unblocks running multiple instances on one host,
reverse-proxy setups, and integration tests that need an ephemeral port.

Refs #308

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`Explorer.findSymbol` now looks up the name in `self.symbol_index` and
builds results from the cached locations. The full outline scan is kept
as a fallback for safety.

For the index to be authoritative, `rebuildSymbolIndexFor` no longer
skips `.import` / `.comment_block` kinds — those were being missed by the
O(1) path and forced callers into the slow scan. Indexing every kind
makes results match the scan-based path exactly.

Refs #309

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop the hardcoded 7719 fallback. If CODEDB_PORT is unset, `codedb serve`
exits with a clear message explaining how to enable it (suggested 47719,
since 7719 and 8080 tend to collide with other local processes). If set
but not parseable as u16, exit with an error.

Rationale: the HTTP server opens a network port; having it bind on a
predictable default when someone runs `codedb serve` accidentally is
worth avoiding. Treating the env var as the on/off switch keeps the
surface area minimal and makes the enabled case explicit in shell
history / process listings.

Refs #308

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Running `codedb serve` is itself the opt-in — codedb has no always-on
daemon, so gating the listener behind an additional CODEDB_PORT
requirement was belt-and-suspenders with no threat to block. Restore the
previous UX: `codedb serve` starts listening on a default port,
CODEDB_PORT stays as an optional override for collisions.

Default is now 6767 (picked off the beaten path — 7719 and 8080
collided with other local tooling).

Refs #308

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`isPathSafe` only rejected absolute `/` paths and `..` segments split on
`/`, so inputs like `..\\..\\secret.txt` passed through — on platforms
where `\\` is a real separator this could reach files outside the
indexed tree through `/file/read` and `/edit`. Null bytes likewise could
truncate paths in downstream syscalls.

Mirror `mcp.isPathSafe`: reject null bytes and backslashes up front
before the `/`-split loop.

Addresses Codex P1 on #310.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The HTTP handler was opening files via `std.Io.Dir.cwd()`, but `codedb
<root> serve` indexes paths relative to the provided root. Launched from
any other directory, valid indexed paths hit the wrong base and returned
false 404s (or worse — read the wrong file).

Open via `explorer.root_dir` instead. Respond 500 with a clear error if
the root was never configured (shouldn't happen on the normal serve
path, but guards against a bare explorer).

Addresses Codex P2 on #310.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The earlier change early-returned after the `symbol_index` lookup, but
that map can be incomplete after fast-snapshot restore — `outlines` is
populated before `rebuildSymbolIndexFor` runs on every file, and later
watcher/edit updates only touch files they saw change. Symbols present
in untouched files were silently dropped from results once the index
had any entry for the name.

Keep the O(1) path for the common case, but always fall through into
the outline scan and dedupe against a per-call `(path, line_start)` set
so the scan fills gaps without duplicating index hits.

Addresses Codex P1 on #310.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rolls up:
  - Restore `codedb serve --port` on Zig 0.16 (#307)
  - Route MCP status output to stderr (closes #304)
  - Default serve port 6767; CODEDB_PORT override (#308)
  - O(1) findAllSymbols with safety merge-scan (#309)
  - Harden server.isPathSafe against `\` and NUL
  - /file/read resolves against indexed root

See PR #310.
Release candidate for the local-server trial roll-up (PR #310). Covers
the Zig 0.16 HTTP server restore, MCP stderr routing (#304), default
port 6767 with CODEDB_PORT override (#308), O(1) findAllSymbols (#309),
and the two Codex P1 fixes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
)

codedb_remote today only hits codedb.codegraff.com (WASM-on-Workers
indexer). Its sibling project codedb-cloud / wiki.codes is a
Zig-native parquet router with a superset of actions and more repos
indexed, but agents using codedb have no path to it.

Adds an optional `backend` field to codedb_remote. Default stays
"codegraff" — every existing caller is unchanged on the wire.

Backends + supported actions:

  codegraff (default):  tree, outline, search, meta
  wiki:                 tree, outline, search, symbol, policy

`symbol` (exact-identifier definition lookup across an indexed repo)
and `policy` (hot-pin size class) are new capabilities from wiki that
codegraff doesn't expose; `meta` stays codegraff-only.

Wiki requests go through the Vercel `/api/query` proxy at
https://www.wiki.codes which server-side-auths to the Hetzner router.
No client secrets, no API key. Slug is derived from the repo arg by
replacing '/' with '-' (matches wiki's canonical naming: rust-lang/rust
→ rust-lang-rust).

Per-backend action allowlists reject cross-backend mismatches with a
clear error:
  action 'meta' not supported on backend 'wiki'
    (wiki supports: tree, outline, search, symbol, policy)

Verified live against wiki.codes:
  codedb_remote repo=rust-lang/rust backend=wiki action=symbol
                query=HashMap
  → 25 hits in 686ms including
    library/std/src/collections/hash/map.rs:247 struct_def
    and impl_blocks, plus clippy + rust-analyzer test files

Closes #311.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Before this change, calling codedb_remote with action=search (or
action=symbol/outline on the wiki backend) but no 'query' argument
silently sent `q=` to the remote. codegraff.com would return an empty
result set or an unhelpful error, and users couldn't tell whether
their search was genuinely empty or the request was malformed.

Fail fast with a pointer at the missing field:

  error: action 'search' requires a non-empty 'query' (the search text)
  error: action 'symbol' requires a non-empty 'query' (the identifier name to look up)
  error: action 'outline' requires a non-empty 'query' (the file path to outline)

tree / meta / policy are unchanged (they legitimately take no query).

Verified via the MCP stdio interface:
  tools/call codedb_remote {"repo":"x","action":"search"}
    → error with guidance
  tools/call codedb_remote {"repo":"x","action":"symbol","backend":"wiki"}
    → error with guidance
  tools/call codedb_remote {"repo":"x","action":"tree","backend":"wiki"}
    → succeeds (unchanged)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The idle watchdog in main.zig closes stdin when `now - last_activity
> idle_timeout_ms` (10 min). But last_activity is only updated once
per incoming message — right after readLineBuf (mcp.zig:474). Inside
a single bundle call that takes longer than idle_timeout_ms (many
slow sub-ops, or ops that shell out to codedb_remote / codedb_tree
on big repos), the clock stays frozen at message-arrival time. At
the 10-minute mark the watchdog closes stdin mid-processing.

The main thread finishes the bundle and writes the response fine
(stdout is untouched), but the client — whose write-end of the
stdin pipe just got EPIPE'd — reports "Transport closed" on its
next tool call.

Fix: stamp last_activity at bundle start AND at the end of each
sub-op iteration, so active processing keeps us marked live.
Every sub-op takes a known-bounded time, so the watchdog can only
fire when the main thread truly has nothing in flight.

No change to the idle path: a bundle that completes in under 10min
doesn't touch last_activity beyond what was already there; sessions
that actually go idle still get reaped.

Fixes #278 for the bundle case. If "Transport closed" still surfaces
on non-bundle paths, we'll need a repro that doesn't go through
handleBundle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
[codex] Add native C outline parser
codedb_remote: reject empty query on actions that consume it
mcp: refresh last_activity during long bundle processing (#278)
detect: add common extension language coverage
parse: add lightweight outlines for common extensions
test: add golden coverage for extension parsers
Cache snapshot responses by store sequence
justrach and others added 28 commits April 26, 2026 12:06
Point codedb_remote at api.wiki.codes
Extend MCP idle timeout to one hour
Add an explicit `read` action on the codegraff backend (server-side
support tracked in #333) plus per-action optional URL params so big
responses can be sliced server-side as the cloud catches up:

- tree: forward limit/offset/prefix/expand
- commits, dep-history: forward limit/since
- read: forward path (required) and lines

Refactor handleRemote so a single fetchRemote helper builds the curl
argv (with one --data-urlencode per param). Cuts the duplicated
codegraff-vs-wiki branches from ~210 lines to one shared path while
preserving every existing error message verbatim.

Schema now advertises read + the new params + scope/backend so callers
can discover the matrix without reading source.

Refs #332 #333 #335

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop -sf in favor of -s + curl -w '%{http_code}' so non-2xx responses
come through with their body and the actual status. fetchRemote now
parses a [CODEDB-STATUS] sentinel from the trailing curl output and
returns {captured, status, body_len}.

handleRemote treats 200-299 as success; anything else formats as
"error: <host> HTTP <code> for <slug>/<action> — <body>". Body text is
truncated to 200 chars; falls back to stderr if body is empty.

Closes #334. Also self-diagnoses #338: previously opaque "outline
returned error" now reads "HTTP 404 — {\"error\":\"No outline
section\"}", making it obvious the snapshot just lacks outline data.

Refs #334 #338

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
codedb_remote: forward pagination/read params + add 'read' action
codedb_remote: use wiki backend only
Persist project-cache snapshot after CLI indexing
Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
After resolving conflicts by taking origin/release/0.2.579 for the
tools_list and handleRemote hunks, re-applied all deferred-scan changes
to src/mcp.zig: DeferredScan struct, Session.deferred_scan field,
updated run() signature, and parseRoots trigger call.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
fix(mcp): defer scan until roots handshake; harden root_policy
* test: add E2E MCP test harness (scripts/e2e_mcp_test.py)

Covers three scenarios before any MCP merge:
1. issue-346 regression: spawn from cwd=/, roots handshake, tools verify
2. Normal mode: explicit positional root, immediate scan, no roots needed
3. No-roots client: spawn from /, stays alive gracefully with 0 files

All 17 tests pass. Documented in AGENTS.md as pre-merge verification step.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(mcp): address Codex review P1/P2 issues in deferred scan

P1 #1 (triggered reset): reset triggered=false in triggerScanFromRoots
when getDataDir fails, so next valid roots/list response can retry instead
of silently never scanning.

P1 #2 (joinable scan thread): store the spawned scanBg thread in
DeferredScan.scan_thread; join it during shutdown instead of detaching.
This eliminates the use-after-free where the background scan thread
could touch store/explorer after mainImpl returns.

Also fix the deferred watcher: replace watcherDeferredLoop (which only
waited for scan_done) to run watcher.incrementalLoop directly after
scan completes, removing the second detached thread spawn from
triggerScanFromRoots. Both the scan thread and the watcher thread
are now fully joinable.

P2 (explicit root): DeferredScan.resolved_root captures the abs_root
accepted from roots/list so watcherDeferredLoop can pass it to
incrementalLoop without any extra allocation.

All 17 E2E tests pass. zig build test passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

---------

Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@justrach justrach merged commit d65e6c2 into main Apr 30, 2026
1 check failed
@justrach justrach deleted the release/0.2.579 branch May 21, 2026 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant