Skip to content

v0.42.14.0 fix(zero-config): code-* readiness signal + init embedding-key validation + lock self-heal (#1780)#1804

Merged
garrytan merged 7 commits into
masterfrom
garrytan/fix-issue-1780
Jun 3, 2026
Merged

v0.42.14.0 fix(zero-config): code-* readiness signal + init embedding-key validation + lock self-heal (#1780)#1804
garrytan merged 7 commits into
masterfrom
garrytan/fix-issue-1780

Conversation

@garrytan

@garrytan garrytan commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Closes #1780.

Three zero-config gaps that made a brain silently fail for an unattended/agent consumer. Two real, one optional (all included).

Gap 1 — code-* readiness signal (was: silent empty result)

code-def / code-refs / code-callers / code-callees returned count: 0 in three indistinguishable cases: graph not built, source never synced, or genuinely no match. An agent reading count: 0 couldn't tell "wait and retry" from "trust this." Now every CLI envelope and the four MCP ops carry a typed status (not_built | indexing | ready | unknown) + ready boolean.

  • count: 0 + ready: true → genuinely none. ready: false → ask later.
  • New src/core/code-graph-readiness.ts: EXISTS-based (cheap; no page_kind index needed, pending probe rides the partial idx_content_chunks_edges_backfill), chunk-grain, fail-open on DB error. count > 0 short-circuits to ready with no query.
  • def/refs are 2-state + brain-wide (their data is set at chunk time). callers/callees are 3-state + source-scoped, with the pending predicate mirroring the resolver (edges_backfilled_at IS NULL OR < EDGE_EXTRACTOR_VERSION_TS) so a resolver-version bump can't falsely report ready.

Gap 2 — gbrain init validates the embedding key (was: silent embedded=0 at first sync)

Init persisted --embedding-model without checking the key; first sync imported pages but embedded zero, and search came back empty. Now init runs a free config-only diagnoseEmbedding (missing key, all providers) + a best-effort 1-token live test-embed (invalid/expired key, 5s timeout). Loud warning to stderr, init still exits 0.

  • New src/core/init-embed-check.ts. Builds the effective env (process.env + file-plane *_api_key + --key) via buildGatewayConfig so it sees the same keys + provider base URLs as runtime (no false warning for config.json-keyed users).
  • buildGatewayConfig extracted to src/core/ai/build-gateway-config.ts (cli.ts re-exports).
  • New --skip-embed-check flag / GBRAIN_INIT_SKIP_EMBED_CHECK=1; embedding_check in --json. --no-embedding skips the whole check.

Gap 3 — automatic same-host dead-pid cycle-lock takeover (was: TTL-only, ~30min wait)

A crashed lock holder blocked new cycles for the full TTL. tryAcquireDbLock now reclaims a held, not-TTL-expired lock when the same-host holder is provably dead (process.kill ESRCH) past a 60s grace, via a guarded DELETE + one normal-upsert retry returning the standard handle. Cross-host stays TTL-only. The shared classifyHolderLiveness predicate (EPERM treated as alive — never steal a live lock) is reused by gbrain sync --break-lock, fixing its prior EPERM-as-dead bug.

Tests

  • New: test/code-graph-readiness.test.ts (11), test/db-lock-auto-takeover.test.ts (11), test/init-embed-check.test.ts (9, hermetic via the gateway embed-transport seam + withEnv). Readiness-envelope cases added to test/e2e/code-intel-mcp-ops-pglite.test.ts. 3 critical-regression tests: file-plane no-false-warn, readiness-query-throw → unknown, EPERM-as-alive.
  • bun run verify green (29/29: typecheck, privacy, jsonb, wasm, resolver, all guards). 134-test post-merge re-verify across the affected + new suites, 0 fail. Targeted Postgres E2E: 127 pass (1 skip, 1 pre-existing stale failure unrelated to this diff — sync-lock-recovery › --break-lock + --all asserts a refusal string absent in both master and this branch).
  • Manual smoke (isolated GBRAIN_HOME): init with no key → loud warning + embedding_check.ok=false; code-callers foo on a fresh brain → status: "not_built", ready: false.

No schema migration. gbrain upgrade handles everything.

🤖 Generated with Claude Code

Documentation

  • CLAUDE.md — added Key Files entries for src/core/code-graph-readiness.ts, src/core/init-embed-check.ts, src/core/ai/build-gateway-config.ts; noted the tryAcquireDbLock auto-takeover + classifyHolderLiveness on the db-lock.ts entry and the status/ready readiness fields on the code-def/code-refs entry.
  • llms-full.txt — regenerated from CLAUDE.md (bun run build:llms; freshness test green).
  • CHANGELOG.md — v0.42.14.0 entry (ELI10-lead summary + "To take advantage of" block + itemized changes).
  • TODOS.md — filed one P3 follow-up (Two zero-config gaps: code-* should signal "graph not built" vs silent empty, and gbrain init should validate the embedding key #1780): unify init-embed-check.ts:liveTestEmbed with models.ts:probeEmbeddingReachability onto a shared embed-probe core.

Coverage: all shipped surface (code-* status/ready fields, gbrain init --skip-embed-check + embedding_check JSON, the three new modules, db-lock auto-takeover) is documented in CLAUDE.md + CHANGELOG. README is product-level and needs no change. No architecture-diagram drift.

garrytan and others added 7 commits June 2, 2026 22:42
…1780 Gap 1)

New src/core/code-graph-readiness.ts: resolveCodeReadiness() returns a typed
status (not_built | indexing | ready | unknown) + ready boolean so callers can
tell "graph not built / still indexing" apart from "genuinely no match" when
count===0. EXISTS-based (cheap), chunk-grain, resolver-version-matching pending
predicate, fail-open. Wired into the 4 CLI envelopes (+ human hint) and the 4
MCP op handlers. def/refs are 2-state brain-wide; callers/callees 3-state scoped.
…Gap 3)

tryAcquireDbLock now reclaims a held, not-TTL-expired lock when the same-host
holder is provably dead (process.kill ESRCH) past a 60s grace, via guarded
DELETE + one normal-upsert retry returning the normal handle. New shared
injectable classifyHolderLiveness/isHolderDeadLocally (EPERM treated as ALIVE
— never steals a live lock). runBreakLock's safe path consumes the shared
predicate, fixing its prior EPERM-as-dead bug. Cross-host stays TTL-only.
New src/core/init-embed-check.ts: config-only diagnoseEmbedding (missing key,
all providers) + best-effort 1-token live test-embed (invalid/expired key, 5s
timeout, never blocks). Loud warning to stderr, init still exits 0; skipped by
--no-embedding / --skip-embed-check / GBRAIN_INIT_SKIP_EMBED_CHECK=1. Builds the
effective env (process.env + file-plane keys + --key) via buildGatewayConfig,
extracted to src/core/ai/build-gateway-config.ts (cli.ts re-exports) so the
check sees the same keys + provider base URLs as runtime. embedding_check added
to --json.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CLAUDE.md Key Files: add src/core/code-graph-readiness.ts, init-embed-check.ts,
ai/build-gateway-config.ts, and the db-lock auto-takeover + code-* readiness
field behaviors. Regenerate llms-full.txt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1780

# Conflicts:
#	CHANGELOG.md
#	CLAUDE.md
#	VERSION
#	llms-full.txt
#	package.json
@garrytan garrytan merged commit 1036f8f into master Jun 3, 2026
21 checks passed
mgunnin added a commit to mgunnin/gbrain that referenced this pull request Jun 3, 2026
* upstream/master:
  v0.42.23.0 feat(jobs): --nice scheduling-priority flag for jobs work/supervisor (garrytan#1815) (garrytan#1820)
  v0.42.22.0 fix(minions): supervisor progress watchdog + worker DB self-defense — alive-but-wedged worker self-heals (garrytan#1801) (garrytan#1824)
  v0.42.21.0 fix(postgres): module-singleton ownership — canonical landing for the dream-cycle "connect() has not been called" class (garrytan#1404/garrytan#1471/garrytan#1619) (garrytan#1805)
  v0.42.20.0 fix: reliability wave — PGLite capture lock-pin + Postgres reconnect race + search embed-hang (garrytan#1762 garrytan#1745 garrytan#1775) (garrytan#1810)
  v0.42.19.0 fix(skillopt): close the last gap in the AI SDK v6 tool-loop fix (write-capture mapper + regression test) (garrytan#1809)
  v0.42.18.0 fix: sync orphan-pileup watchdog (garrytan#1633) + links-lag µs stamp (garrytan#1768) (garrytan#1807)
  v0.42.17.0 fix(sync): resumable incremental sync — killed mid-import no longer loses progress (garrytan#1794) (garrytan#1808)
  v0.42.16.0 feat(doctor): brain health as a solved problem — cause-ranked doctor + OOM-loop line + auto-drain + pool-reap (garrytan#1685) (garrytan#1802)
  v0.42.15.0 fix: decouple CLI primary output from process.stdout.isTTY (garrytan#1784) (garrytan#1806)
  v0.42.14.0 fix(zero-config): code-* readiness signal + init embedding-key validation + lock self-heal (garrytan#1780) (garrytan#1804)
  v0.42.13.0 fix(search): archive/ content findable by default, demoted not hard-excluded (garrytan#1777) (garrytan#1797)
  v0.42.12.0 feat: self-upgrading gbrain — invocation-riding update check + opt-in auto-upgrade (garrytan#1798)
  v0.42.11.0 feat(skillopt): held-out eval gate, honest receipts, ENFORCE + ablation opts (garrytan#1759)
  v0.42.10.0 feat(extract): opt-in global-basename wikilink resolution (closes garrytan#972) (garrytan#1388)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Two zero-config gaps: code-* should signal "graph not built" vs silent empty, and gbrain init should validate the embedding key

1 participant