Skip to content

v0.41.10.1 fix-wave: dream.* config + batch retry + extract_atoms idempotency + ze-switch env-gate#1445

Merged
garrytan merged 10 commits into
masterfrom
garrytan/pr-1414-1416-1421
May 25, 2026
Merged

v0.41.10.1 fix-wave: dream.* config + batch retry + extract_atoms idempotency + ze-switch env-gate#1445
garrytan merged 10 commits into
masterfrom
garrytan/pr-1414-1416-1421

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

Three production reliability fixes in one wave, rebuilt from closed community PRs #1414, #1416, #1421 (originally proposed by @garrytan-agents) with structural improvements from /plan-eng-review + codex outside-voice review.

The user-visible wins:

  1. dream.* config you set via gbrain config set now actually reaches the cycle phase that reads it. Pre-fix, gbrain config set dream.synthesize.session_corpus_dir /path wrote successfully to DB but extract-atoms skipped with "no transcripts to process" because it read via file-plane-only loadConfig(). The systemic fix lives in loadConfigWithEngine()'s sparse-merge block.

  2. Your background extract and sync cycles stop silently losing rows on PgBouncer connection drops. Pre-fix, ~30% of batched inserts threw No database connection: connect() has not been called and silently dropped 100 rows each. Now a single 500ms retry catches the recycle. Snapshot-before-clear contract means the retry sends the same data even if the producer wrote more during the delay.

  3. gbrain ze-switch won't silently corrupt your brain when env vars override the switch. Pre-fix, GBRAIN_EMBEDDING_MODEL=openai:text-embedding-3-large gbrain ze-switch migrated schema to 2560d columns then embedded with OpenAI's 1536d model — the documented 716K-chunk damage incident. Now it refuses pre-apply (before any mutation) with an ASCII warning box. Same gate fires on --resume so there's no bypass.

  4. extract_atoms stops creating duplicate atoms on re-runs across days. Pre-existing v0.41.2.0 bug: atom slugs were atoms/${todayDate()}/${title-slug}, so re-discovering the same content on day N+1 wrote a second atom. Now uses source-hash existence check — re-runs on unchanged content produce zero new atoms.

  5. Page-based atom extraction — extract_atoms now also processes brain pages (meeting, source, article, video, book, original types), not just raw transcript files. Single SQL query with NOT EXISTS subquery handles discovery + idempotency in one round-trip.

  6. New embedding_env_override doctor check — surfaces env-vs-DB embedding-config drift on every doctor run so the silent-corruption class can't repeat.

Closes: #1414, #1416, #1421 (all closed with detailed comments crediting the original contributor and explaining the structural improvements).

Test Coverage

NEW TEST FILES (61 new cases, all hermetic):
[+] test/extract-batch-retry.test.ts                (16 cases)
[+] test/extract-atoms-page-discovery.test.ts       (17 cases, PGLite)
[+] test/ze-switch-env-override.test.ts             (17 cases, PGLite + withEnv)
[+] test/doctor-embedding-env-override.test.ts      ( 7 cases, PGLite + withEnv)
[+] test/e2e/extract-atoms-discovery-sql.test.ts    ( 4 cases, real Postgres)

EXTENDED TEST FILES:
[+] test/loadConfig-merge.test.ts                   (+8 dream.* cases)
[+] test/cycle/extract-atoms-synthesize-concepts.test.ts (+1 critical regression case)

CRITICAL CONTRACTS PINNED:
  [★★★ TESTED] withRetry classifier covers GBrainError typed shape
  [★★★ TESTED] Snapshot-before-clear: producer mutation during delay doesn't poison retry
  [★★★ TESTED] applyRetrievalUpgrade refused = ZERO setConfig calls (spy assertion)
  [★★★ TESTED] resumeRetrievalUpgrade gate parity (no bypass path)
  [★★★ TESTED] extract_atoms re-run produces zero new atoms (idempotency)
  [★★★ TESTED] _pages:[] preserves v0.41.2.0 transcript-only PhaseResult byte-identical
  [★★★ TESTED] Real-Postgres parity for ANY($::text[]) + JSONB ->> + NOT EXISTS

VERIFICATION RESULTS:
  bun run verify                  → clean (privacy + jsonb + progress + wasm + typecheck)
  bun run test (parallel + serial) → 10,598 pass / 0 fail across 168+42 files
  bun run test:e2e (targeted)     → 18 pass / 0 fail on real Postgres

Pre-Landing Review

Completed via /plan-eng-review + codex outside-voice on 2026-05-25. 11 design decisions made (D1-D11). Codex caught 10 real correctness gaps in the plan (sourceId routing, NULL crash, gate ordering, resume bypass, tagged-union extension, Check.issues schema mismatch, dream.* env claim) — all incorporated as D9 before any code was written. See ~/.claude/plans/system-instruction-you-are-working-peppy-whale.md for the plan + verdict.

Adversarial Review

Cross-model synthesis: Claude structured + Claude adversarial subagent + codex outside-voice review all aligned on the correctness gaps. Codex's most valuable catch was sourceId not being threaded through engine.putPage in extract-atoms — would have shipped atoms-write-to-wrong-source on every federated brain. Fixed before any code landed.

Plan Completion

All 7 implementation tasks (T1-T7) completed. Plan file at ~/.claude/plans/system-instruction-you-are-working-peppy-whale.md. The single deferred item (per-atom idempotency to close the partial-failure-leaves-missing class) is captured in TODOS.md as v0.42+.

TODOS

Added two v0.42+ follow-up TODOs to TODOS.md:

  • Per-atom idempotency via deterministic atom slug (closes the partial-failure documented limitation)
  • Atom-slug consolidation migration (cleans up duplicate atoms from prior v0.41.2.0 runs on long-tenured brains)

Test plan

  • bun run verify clean
  • bun run test — 10,598 pass / 0 fail
  • bun run test:serial — clean
  • bun run test:e2e against real Postgres — 18 pass / 0 fail for new test surface
  • Manual smoke: ze-switch refuses with env override, ze-switch with --ignore-env-override proceeds, doctor surfaces the disagreement

🤖 Generated with Claude Code

garrytan and others added 4 commits May 25, 2026 12:59
…+ ze-switch env-gate + doctor check

Closes PRs #1414, #1416, #1421 (rebuilt from designs by @garrytan-agents
with structural improvements from /plan-eng-review + codex outside-voice).

Three production reliability fixes in one wave:

1. dream.* DB-config merge (closes PR #1416 silent-config gap)
   - loadConfigWithEngine() sparse-merge extends with 7 dream.* keys
   - File > DB > defaults precedence (no GBRAIN_DREAM_* env vars)
   - extract-atoms switches to loadConfigWithEngine() so DB-plane keys reach it

2. Batch retry on transient connection drops (closes PR #1416 ~30%-loss bug)
   - withRetry() pure primitive exported from src/commands/extract.ts
   - 6 flush() sites snapshot-before-clear with onRetry callback
   - Reuses isRetryableConnError from src/core/retry-matcher.ts
   - retry-matcher extended with GBrainError{problem:'No database connection'}

3. extract_atoms source-hash idempotency + page-based discovery (closes #1414)
   - One raw SQL with NOT EXISTS subquery replaces 6 listPages + N atom checks
   - sourceId threaded through every putPage call (codex caught real bug)
   - NULL content_hash filter + dream_generated exclusion + transcript-side idempotency
   - cycle.ts passes union of syncPagesAffected + synthesizeWrittenSlugs

4. ze-switch pre-apply + pre-resume env-override gate (closes PR #1421)
   - Gate fires FIRST in apply AND resume; zero setConfig calls on refusal
   - ASCII warning box (no Unicode per repo D10)
   - --ignore-env-override escape hatch for power users
   - ApplyResult extended with refused variant

5. doctor embedding_env_override check (defense-in-depth for #1421)
   - Cross-surface parity: buildChecks() + doctorReportRemote()
   - Uses Check.details (not Check.issues per codex schema review)

Co-Authored-By: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 61 new tests across 5 new files pinning the fix-wave contracts:
- test/extract-batch-retry.test.ts (16 cases) — withRetry primitive + snapshot contract
- test/extract-atoms-page-discovery.test.ts (17 cases) — discovery SQL + dual-source idempotency
- test/ze-switch-env-override.test.ts (17 cases) — env-gate apply + resume + ZERO-setConfig assertion
- test/doctor-embedding-env-override.test.ts (7 cases) — cross-surface parity
- test/e2e/extract-atoms-discovery-sql.test.ts (4 cases) — real-Postgres parity for raw SQL

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…16-1421

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
…36-vector inserts

CI shards 1 + 4 failed persistently (not flake — confirmed via retry) after the
v0.41.6.0 merge with this error:

  error: expected 1280 dimensions, not 1536
  file: "vector.c", routine: "CheckExpectedDim"

Two test files insert 1536-dim Float32Array vectors into `content_chunks.embedding`
/ `facts.embedding`, but v0.41.5.0 flipped `DEFAULT_EMBEDDING_DIMENSIONS` from
1536 to 1280 (ZE Matryoshka default). On a fresh CI bun process where no prior
test pre-configured the gateway, `initSchema()` sizes the vector column at
vector(1280) and the inserts throw.

Locally this is hidden when an earlier test file in the shard happens to have
called `configureGateway({embedding_dimensions: 1536})` — that state leaks
forward through bun's shared process. The v0.41.6.0 LPT shard re-balancing
reordered files so these two ran cold, surfacing the latent bug.

Fix follows the canonical hermetic pattern from
test/consolidate-valid-until.test.ts:23-34: pin the gateway to 1536d in
beforeAll, reset in afterAll. Test is now isolated from shard ordering.

  test/search-types-filter.test.ts     — shard 1 fail
  test/operations-find-trajectory.test.ts — shard 4 (6 fails)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan closed this May 25, 2026
@garrytan garrytan reopened this May 25, 2026
garrytan added 5 commits May 25, 2026 14:34
…16-1421

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
…16-1421

# Conflicts:
#	CHANGELOG.md
#	TODOS.md
#	VERSION
#	package.json
Per request — version slot moved to .1 micro tier to leave .0 available
for unrelated wave landing on master.
@garrytan garrytan changed the title v0.41.10.0 fix-wave: dream.* config + batch retry + extract_atoms idempotency + ze-switch env-gate v0.41.10.1 fix-wave: dream.* config + batch retry + extract_atoms idempotency + ze-switch env-gate May 25, 2026
…16-1421

# Conflicts:
#	CHANGELOG.md
#	TODOS.md
#	VERSION
#	package.json
#	src/commands/extract.ts
@garrytan garrytan merged commit d036a97 into master May 25, 2026
15 checks passed
garrytan added a commit that referenced this pull request May 25, 2026
Brings in #1445 (v0.41.10.1 — dream.* config + batch retry +
extract_atoms idempotency + ze-switch env-gate fix wave).

Standard trio conflicts resolved per CLAUDE.md procedure:
- VERSION:      ours wins (0.41.11.0).
- package.json: ours wins (version line).
- CHANGELOG.md: both entries kept; ours stays topmost.

Other touched files (src/commands/doctor.ts, src/core/cycle.ts) all
auto-merged cleanly — no semantic conflicts.

Post-merge verification:
- bun install (no changes)
- typecheck clean
- bun run verify PASS (21 checks parallel)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master:
  v0.41.10.1 fix-wave: dream.* config + batch retry + extract_atoms idempotency + ze-switch env-gate (garrytan#1445)
  v0.41.10.0 feat: orphan reduction via --by-mention + UTF-16 surrogate-pair fix (garrytan#1442)
  v0.41.9.0 — UX/reliability fix wave (5 defects from production report) (garrytan#1440)
  v0.41.8.0 fix(pglite): search/query/get exit cleanly + garrytan#1340 hint + garrytan#1342 breadcrumbs (garrytan#1405)
  v0.41.7.0 feat: compact list-format resolver + 300-skill scaling tutorial (garrytan#1407)
  v0.41.6.0 feat(ci): CI test speedup — 23min → ~9min via matrix 4→6 + weight-aware sharding + auto SHA cache + parallel verify (garrytan#1444)
  v0.41.5.0 fix-wave: warm-narwhal — 6 community PRs + E2E reliability (garrytan#1374)

# Conflicts:
#	src/core/ai/recipes/openai.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant