v0.41.10.1 fix-wave: dream.* config + batch retry + extract_atoms idempotency + ze-switch env-gate#1445
Merged
Merged
Conversation
…+ ze-switch env-gate + doctor check Closes PRs #1414, #1416, #1421 (rebuilt from designs by @garrytan-agents with structural improvements from /plan-eng-review + codex outside-voice). Three production reliability fixes in one wave: 1. dream.* DB-config merge (closes PR #1416 silent-config gap) - loadConfigWithEngine() sparse-merge extends with 7 dream.* keys - File > DB > defaults precedence (no GBRAIN_DREAM_* env vars) - extract-atoms switches to loadConfigWithEngine() so DB-plane keys reach it 2. Batch retry on transient connection drops (closes PR #1416 ~30%-loss bug) - withRetry() pure primitive exported from src/commands/extract.ts - 6 flush() sites snapshot-before-clear with onRetry callback - Reuses isRetryableConnError from src/core/retry-matcher.ts - retry-matcher extended with GBrainError{problem:'No database connection'} 3. extract_atoms source-hash idempotency + page-based discovery (closes #1414) - One raw SQL with NOT EXISTS subquery replaces 6 listPages + N atom checks - sourceId threaded through every putPage call (codex caught real bug) - NULL content_hash filter + dream_generated exclusion + transcript-side idempotency - cycle.ts passes union of syncPagesAffected + synthesizeWrittenSlugs 4. ze-switch pre-apply + pre-resume env-override gate (closes PR #1421) - Gate fires FIRST in apply AND resume; zero setConfig calls on refusal - ASCII warning box (no Unicode per repo D10) - --ignore-env-override escape hatch for power users - ApplyResult extended with refused variant 5. doctor embedding_env_override check (defense-in-depth for #1421) - Cross-surface parity: buildChecks() + doctorReportRemote() - Uses Check.details (not Check.issues per codex schema review) Co-Authored-By: garrytan-agents <garrytan-agents@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 61 new tests across 5 new files pinning the fix-wave contracts: - test/extract-batch-retry.test.ts (16 cases) — withRetry primitive + snapshot contract - test/extract-atoms-page-discovery.test.ts (17 cases) — discovery SQL + dual-source idempotency - test/ze-switch-env-override.test.ts (17 cases) — env-gate apply + resume + ZERO-setConfig assertion - test/doctor-embedding-env-override.test.ts (7 cases) — cross-surface parity - test/e2e/extract-atoms-discovery-sql.test.ts (4 cases) — real-Postgres parity for raw SQL Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…16-1421 # Conflicts: # CHANGELOG.md # VERSION # package.json
…36-vector inserts
CI shards 1 + 4 failed persistently (not flake — confirmed via retry) after the
v0.41.6.0 merge with this error:
error: expected 1280 dimensions, not 1536
file: "vector.c", routine: "CheckExpectedDim"
Two test files insert 1536-dim Float32Array vectors into `content_chunks.embedding`
/ `facts.embedding`, but v0.41.5.0 flipped `DEFAULT_EMBEDDING_DIMENSIONS` from
1536 to 1280 (ZE Matryoshka default). On a fresh CI bun process where no prior
test pre-configured the gateway, `initSchema()` sizes the vector column at
vector(1280) and the inserts throw.
Locally this is hidden when an earlier test file in the shard happens to have
called `configureGateway({embedding_dimensions: 1536})` — that state leaks
forward through bun's shared process. The v0.41.6.0 LPT shard re-balancing
reordered files so these two ran cold, surfacing the latent bug.
Fix follows the canonical hermetic pattern from
test/consolidate-valid-until.test.ts:23-34: pin the gateway to 1536d in
beforeAll, reset in afterAll. Test is now isolated from shard ordering.
test/search-types-filter.test.ts — shard 1 fail
test/operations-find-trajectory.test.ts — shard 4 (6 fails)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…16-1421 # Conflicts: # CHANGELOG.md # VERSION # package.json
…16-1421 # Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json
Per request — version slot moved to .1 micro tier to leave .0 available for unrelated wave landing on master.
…16-1421 # Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json # src/commands/extract.ts
garrytan
added a commit
that referenced
this pull request
May 25, 2026
Brings in #1445 (v0.41.10.1 — dream.* config + batch retry + extract_atoms idempotency + ze-switch env-gate fix wave). Standard trio conflicts resolved per CLAUDE.md procedure: - VERSION: ours wins (0.41.11.0). - package.json: ours wins (version line). - CHANGELOG.md: both entries kept; ours stays topmost. Other touched files (src/commands/doctor.ts, src/core/cycle.ts) all auto-merged cleanly — no semantic conflicts. Post-merge verification: - bun install (no changes) - typecheck clean - bun run verify PASS (21 checks parallel) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: v0.41.10.1 fix-wave: dream.* config + batch retry + extract_atoms idempotency + ze-switch env-gate (garrytan#1445) v0.41.10.0 feat: orphan reduction via --by-mention + UTF-16 surrogate-pair fix (garrytan#1442) v0.41.9.0 — UX/reliability fix wave (5 defects from production report) (garrytan#1440) v0.41.8.0 fix(pglite): search/query/get exit cleanly + garrytan#1340 hint + garrytan#1342 breadcrumbs (garrytan#1405) v0.41.7.0 feat: compact list-format resolver + 300-skill scaling tutorial (garrytan#1407) v0.41.6.0 feat(ci): CI test speedup — 23min → ~9min via matrix 4→6 + weight-aware sharding + auto SHA cache + parallel verify (garrytan#1444) v0.41.5.0 fix-wave: warm-narwhal — 6 community PRs + E2E reliability (garrytan#1374) # Conflicts: # src/core/ai/recipes/openai.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three production reliability fixes in one wave, rebuilt from closed community PRs #1414, #1416, #1421 (originally proposed by @garrytan-agents) with structural improvements from
/plan-eng-review+ codex outside-voice review.The user-visible wins:
dream.*config you set viagbrain config setnow actually reaches the cycle phase that reads it. Pre-fix,gbrain config set dream.synthesize.session_corpus_dir /pathwrote successfully to DB butextract-atomsskipped with "no transcripts to process" because it read via file-plane-onlyloadConfig(). The systemic fix lives inloadConfigWithEngine()'s sparse-merge block.Your background
extractandsynccycles stop silently losing rows on PgBouncer connection drops. Pre-fix, ~30% of batched inserts threwNo database connection: connect() has not been calledand silently dropped 100 rows each. Now a single 500ms retry catches the recycle. Snapshot-before-clear contract means the retry sends the same data even if the producer wrote more during the delay.gbrain ze-switchwon't silently corrupt your brain when env vars override the switch. Pre-fix,GBRAIN_EMBEDDING_MODEL=openai:text-embedding-3-large gbrain ze-switchmigrated schema to 2560d columns then embedded with OpenAI's 1536d model — the documented 716K-chunk damage incident. Now it refuses pre-apply (before any mutation) with an ASCII warning box. Same gate fires on--resumeso there's no bypass.extract_atomsstops creating duplicate atoms on re-runs across days. Pre-existing v0.41.2.0 bug: atom slugs wereatoms/${todayDate()}/${title-slug}, so re-discovering the same content on day N+1 wrote a second atom. Now uses source-hash existence check — re-runs on unchanged content produce zero new atoms.Page-based atom extraction — extract_atoms now also processes brain pages (meeting, source, article, video, book, original types), not just raw transcript files. Single SQL query with NOT EXISTS subquery handles discovery + idempotency in one round-trip.
New
embedding_env_overridedoctor check — surfaces env-vs-DB embedding-config drift on every doctor run so the silent-corruption class can't repeat.Closes: #1414, #1416, #1421 (all closed with detailed comments crediting the original contributor and explaining the structural improvements).
Test Coverage
Pre-Landing Review
Completed via
/plan-eng-review+ codex outside-voice on2026-05-25. 11 design decisions made (D1-D11). Codex caught 10 real correctness gaps in the plan (sourceId routing, NULL crash, gate ordering, resume bypass, tagged-union extension, Check.issues schema mismatch, dream.* env claim) — all incorporated as D9 before any code was written. See~/.claude/plans/system-instruction-you-are-working-peppy-whale.mdfor the plan + verdict.Adversarial Review
Cross-model synthesis: Claude structured + Claude adversarial subagent + codex outside-voice review all aligned on the correctness gaps. Codex's most valuable catch was
sourceIdnot being threaded throughengine.putPagein extract-atoms — would have shipped atoms-write-to-wrong-source on every federated brain. Fixed before any code landed.Plan Completion
All 7 implementation tasks (T1-T7) completed. Plan file at
~/.claude/plans/system-instruction-you-are-working-peppy-whale.md. The single deferred item (per-atom idempotency to close the partial-failure-leaves-missing class) is captured in TODOS.md as v0.42+.TODOS
Added two v0.42+ follow-up TODOs to
TODOS.md:Test plan
bun run verifycleanbun run test— 10,598 pass / 0 failbun run test:serial— cleanbun run test:e2eagainst real Postgres — 18 pass / 0 fail for new test surface--ignore-env-overrideproceeds, doctor surfaces the disagreement🤖 Generated with Claude Code