v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes #1559, #1561)#1571
Merged
Merged
Conversation
#1559) Closes the silent-no-op class where `gbrain dream --source <id>` ran the cycle but never wrote `last_full_cycle_at`, leaving `gbrain doctor`'s cycle_freshness check stuck red forever. Changes to src/commands/dream.ts: - DreamArgs.source field; parseArgs recognizes --source <id> AND the --source-id alias (matches v0.37.7.0 #1167 naming across import/extract/graph-query) - Argv validation: missing value → exit 2; repeated different values → exit 2; --source X --source-id Y conflict → exit 2; same-value repetition → accepted - --help short-circuit ordering preserved with IRON-RULE comment + structural test guard - runDream engine-null guard: --source requires a connected brain - runDream resolveSourceId → archived-source guard via fetchSource from src/core/sources-load.ts (single-row SELECT that projects archived + handles pre-v0.26.5 schema via isUndefinedColumnError) - Typed-error try/catch via isResolverUserError predicate: only swallows known resolver-user errors; TypeError / postgres errors propagate uncaught with stack trace so genuine programmer bugs aren't hidden behind operator-error UX - Forwarded sourceId to runCycle; existing v0.38 writeback at cycle.ts:1947-1967 now actually fires - --help text documents both flag names Tests: - test/dream-cli-flags.test.ts: structural assertions for new flags, help text, IRON-RULE comment guard, resolver/predicate wiring - test/dream.test.ts: 13 PGLite integration cases covering happy path (the regression that closes PR #1559), back-compat, alias equivalence, all argv edge cases, engine-null, archived, --help short-circuit ordering, T3 typed-error propagation, and D5 end-to-end dream→checkCycleFreshness column-name drift guard Plan + 11 decisions: ~/.claude/plans/system-instruction-you-are-working-starry-papert.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
emoji crash) Closes the 2026-05-24 production SYNTH_PHASE_FAIL: 🤖 (U+1F916, surrogate pair U+D83E U+DD16) at offset 3999 in a long telegram transcript made the raw 4000-char slice produce a lone high surrogate; Anthropic's JSON parser rejected the payload with "no low surrogate in string"; the synthesize phase failed. Changes to src/core/cycle/synthesize.ts: - judgeSignificance head+tail slice routed through safeSplitIndex from src/core/text-safe.ts (already imported) - Did NOT introduce safeSliceEnd from PRs #1559+#1561 — that helper re-introduces the case-3 bug src/core/text-safe.ts:18-21 documents - Did NOT touch findBoundary — master already routes through safeSplitIndex per the v0.42.0.0 wave Tests in test/cycle-synthesize.test.ts: - New describe('judgeSignificance — UTF-16 safety') block - test.each over head boundaries (offsets 3998-4001) AND tail boundaries (offsets 3999-4002) for an 8001-char content with the robot emoji placed at each - Primary assertion: explicit unpaired-surrogate scan over the captured prompt (NOT JSON.stringify per codex C-11 — V8/JSCore do not throw on lone surrogates, so that assertion was weak) - Sub-8000 short-content branch case: no slicing, emoji passes through unchanged Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…edes #1561) Closes the bug class where scraper error pages with titles like "Forbidden", "Access Denied", "Service Unavailable", "Robot Check", and "Just a moment..." were slipping through the ingest gate because the matcher only caught bare numeric codes (403/404/500...) and "page not found". 232+ pages observed (202+ from straylight- brain) were inflating page counts and tripping content_sanity_audit_recent on every doctor run. Changes to src/core/content-sanity.ts BUILT_IN_JUNK_PATTERNS: - Expanded error_page_title regex to also catch forbidden, access denied, service unavailable, robot check, verify you are human (case-insensitive, anchored — so long-form essays about these topics still ingest fine) - New cloudflare_challenge_title pattern with DISTINCT name from error_page_title (PR #1561 collapsed both into one name and lost audit signal — the new name preserves diagnosability in ~/.gbrain/audit/content-sanity-YYYY-Www.jsonl and doctor's content_sanity_audit_recent aggregation) - Dropped PR #1561's bare-`error` matcher — too aggressive on legitimate concept/taxonomy pages titled exactly "Error" Tests: - test/content-sanity.test.ts: pattern-count locked at 7, new matches via test.each, over-match regression guard (legitimate prose titled "How to Handle Access Denied Errors" / "Error Boundary in React" etc. must pass), audit-name distinctness pinned - test/import-file-content-sanity.test.ts: end-to-end ContentSanityBlockError via importFromContent for each new pattern family (D6 — assessor wiring coverage, not just regex) Out of scope, filed in TODOS.md as TODO-V13-C: gbrain pages audit-junk-titles legacy-cleanup command. Dropped from this PR per codex outside-voice tension (T1) for ship-and-validate- matchers-first discipline. The 200+ pre-existing scraper pages already in the DB will get the destructive-cleanup operator surface after ~1 week of production observation against this matcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VERSION + package.json bump to 0.41.23.0.
CHANGELOG voice: ELI10 lead naming the bug ("`gbrain dream --source
<id>` finally counts as a cycle"), then per-fix detail, then a
"To take advantage of v0.41.23.0" operator-action block and itemized
changes.
TODOS.md v0.41.23.x follow-ups:
- TODO-V13-A (P2): --max-pages plumbing (PR #1559's flag, deferred
because CycleOpts has no maxPages field today)
- TODO-V13-B (P3): --source vs --source-id flag-name unification
across all CLI commands
- TODO-V13-C (P2): gbrain pages audit-junk-titles legacy cleanup
(deferred for ~1 week of matcher production observation)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce-ingest-titles # Conflicts: # CHANGELOG.md # VERSION # package.json
This was referenced May 27, 2026
Master shipped v0.41.23.0 + v0.41.24.0 mid-review; this branch originally bumped to v0.41.25.0 post-merge. User flagged v0.41.26.0 to leave a slot open for another in-flight PR. No code changes; VERSION + package.json + CHANGELOG header + "To take advantage" section updated in lockstep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce-ingest-titles # Conflicts: # CHANGELOG.md # VERSION # package.json
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571) v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566) v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543) v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541) v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562) v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542) v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545) v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544) feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537) v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521) v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519) v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510) v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)
garrytan
added a commit
that referenced
this pull request
May 29, 2026
… drain + disconnect audit (closes #1570) (#1608) * merge master: rebump v0.41.25.0 → v0.41.27.0 (queue collision) Master shipped v0.41.25.0 (#1538 batched sync deletes) and v0.41.26.0 (#1571 dream --source fix) while this branch was in flight. Conflict resolution rebumps to the next available slot. - VERSION: 0.41.25.0 → 0.41.27.0 - package.json: synced - CHANGELOG.md: my v0.41.27.0 entry placed above master's v0.41.26.0 and v0.41.25.0; in-entry version references updated 0.41.25.0 → 0.41.27.0 and forward-references bumped to v0.41.28+. - TODOS.md: kept master's v0.41.20.x section + my v0.41.27.0+ follow-ups No source-file conflicts during the merge. * feat(diagnostics): db-disconnect audit + doctor surface (v0.41.27.0) Instruments every db.disconnect() and PostgresEngine.disconnect() call with a JSONL audit record so the next user-reported #1570 cycle gives us the offender's caller stack instead of the symptomatic "No database connection" error. Audit shape (~/.gbrain/audit/db-disconnect-YYYY-Www.jsonl): {ts, engine_kind, connection_style, caller_stack[], command, pid} - src/core/audit/db-disconnect-audit.ts (NEW): the audit writer, built on the v0.40.4.0 createAuditWriter cathedral. Captures a 6-frame stack via new Error().stack so the offender is readable without spending stderr noise. - src/core/db.ts: logDbDisconnect call at the top of disconnect() (best-effort; never blocks the real teardown). - src/core/postgres-engine.ts: same instrumentation in PostgresEngine.disconnect() — distinguishes 'module' vs 'instance' connection_style so we can tell legitimate worker-pool teardowns apart from the load-bearing module-singleton class. - src/commands/doctor.ts: extends batch_retry_health to surface 24h disconnect count + most-recent caller stack. Warns when the caller frame isn't a known CLI-exit frame (e.g. cli.ts's finally block at the end of an op-dispatch). This is the diagnostic that tells v0.41.28+ where to apply the real ownership fix. - test/db-disconnect-audit.test.ts: unit coverage for the audit writer + caller-stack capture + JSONL shape. - test/e2e/db-singleton-shared-recovery.test.ts: real-Postgres regression that exercises the singleton-null path end-to-end. Refs #1570 * feat(retry): self-heal on null singleton — closes #1570 symptom (v0.41.27.0) withRetry gains an opt-in reconnect callback that fires between the isRetryableConnError classification and the inter-attempt sleep. PostgresEngine.batchRetry injects this.reconnect() — race-safe via the existing _reconnecting guard, handles module and instance pools. Closes the production loss reported in #1570: dream cycles on Supabase no longer drop ~150 link rows per cycle when the singleton goes null mid-batch. The retry now rebuilds the connection between attempts so the second try has somewhere to write to. - src/core/retry.ts: WithRetryOpts gains `reconnect?: () => Promise<void>`. Awaited in the catch branch. onRetry is also now awaited (back-compat- safe: every existing in-tree caller is a sync arrow). Reconnect failures propagate as the real cause — replaces the symptomatic "No database connection" error with whatever the connect() throw was, so operators see the truth. - src/core/postgres-engine.ts:batchRetry — injects `reconnect: () => this.reconnect()`. Covers all 9 batch-retry call sites (addLinksBatch, addTimelineEntriesBatch, upsertChunks, plus the 6 caller-supplied auditSite labels in extract / sync / reindex). - test/core/retry-reconnect.test.ts: 8 hermetic cases pinning the contract — reconnect fires before sleep, only on retryable errors, back-compat when omitted, signal-aborted bypasses reconnect, onRetry is awaited, full success path end-to-end. The deeper bug (who's calling disconnect mid-cycle) is left unaddressed in this commit by design — the diagnostic instrumentation in the prior commit will tell us in the next production run. Refs #1570 * feat(facts): drainPending() + CLI await before disconnect (v0.41.27.0) Closes the silent 'No database connection' tail-end errors after gbrain capture / put_page: the facts:absorb fire-and-forget queue sometimes outlived the CLI process's connection lifetime, so absorb attempts after engine.disconnect() landed in stderr as the GBrainError shape. - src/core/facts/queue.ts: new drainPending({timeout: 1000}) method distinct from shutdown(). Stops accepting new enqueues, awaits in-flight settle, bounded by timeout, returns count of unfinished. Semantically different from shutdown() (which aborts in-flight) so the symptom — drop work that hasn't started yet but let in-flight work finish — matches what CLI exit actually needs. - src/cli.ts: op-dispatch finally block awaits the drain BEFORE engine.disconnect(). Bounded 1s. Opt-out env GBRAIN_NO_FACTS_DRAIN for callers that don't enqueue (keeps fast-exit paths fast). Mirrors the v0.41.8.0 awaitPendingLastRetrievedWrites pattern. - test/facts-queue-drain-pending.test.ts: 6 hermetic cases — empty drain returns immediately, single in-flight settles, timeout bounds wait, shutdown-after-drain is idempotent, post-drain enqueues are dropped, signal-aborted skips waiting. Refs #1570 * docs: update project documentation for v0.41.27.0 README.md: added troubleshooting entry for the v0.41.27.0 retry-reconnect + facts:absorb drain fix (closes #1570), pointing operators at `gbrain doctor --json` to find the offending disconnect caller. CLAUDE.md: extended `src/core/retry.ts` entry with the new optional `reconnect` callback (v0.41.27.0); added two new Key Files entries for `src/core/audit/db-disconnect-audit.ts` (the diagnostic half of the "instrument first, fix later" pivot) and `FactsQueue.drainPending`; extended `doctor.ts:checkBatchRetryHealth` entry with the in-place extension that surfaces 24h disconnect-call count. llms-full.txt: regenerated to absorb CLAUDE.md edits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: rebump v0.41.27.0 → v0.41.28.0 (queue collision with #1573) Master shipped v0.41.27.0 (#1573 git-aware sync_freshness) claiming the same slot. Rebump to the next available version. - VERSION + package.json → 0.41.28.0 - CHANGELOG.md: my entry header + in-entry refs 0.41.27.0 → 0.41.28.0 - TODOS.md: my #1570 follow-up section header + body refs bumped * test: pin gateway in put-page-provenance + embedding-dim-check (CI shard fix) Both files failed on CI shards 1 and 8 under the cross-file gateway-state leak class (CLAUDE.md "Test-isolation lint and helpers"). The v0.41.28.0 merge reshuffled the weight-based shard bin-packing, landing a gateway-mutating sibling ahead of these two victims in the same `bun test` process. Mechanism: - put-page-provenance: put_page embeds via the gateway. A sibling left the gateway configured with OpenAI + the CI placeholder `sk-test` (captured at configureGateway time, survives the withEnv restore as cached gateway state). put_page's embed then fired against live OpenAI and 401'd. The bunfig legacy-embedding preload's beforeEach only re-applies legacy when the gateway was RESET — it does NOT correct a sibling that configured a different LIVE config. - embedding-dim-check: initSchema builds the content_chunks vector column at the gateway's configured dim. A sibling leaking ZE/1280 made the column 1280-d, so `expect(dims).toBe(1536)` failed. Fix (victim-side pinning, the escape hatch the preload documents): - Both: configure the gateway explicitly in beforeAll BEFORE initSchema (OpenAI/1536), resetGateway() in afterAll so neither leaks onward. - put-page-provenance also stubs the embed transport via __setEmbedTransportForTests so embed is deterministic and offline; a dummy OPENAI_API_KEY is supplied in the gateway env because instantiateEmbedding builds the OpenAI client (key check) BEFORE the stubbed transport is reached — the stub then intercepts the actual call so the key never leaves the process. Verified: CI shards 1 (1337 pass) + 8 (905 pass) green with OPENAI_API_KEY unset, plus adversarial sibling orderings (gateway.test / doctor-ze-checks preceding). Typecheck + check-test-isolation clean. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Production-quality fix wave that absorbs and supersedes community PRs #1559 and #1561 (both by
@garrytan-agents). Ships three coordinated fixes as four atomic commits:gbrain dream --source <id>finally writeslast_full_cycle_at. The flag was parsed but never forwarded torunCycle, so doctor'scycle_freshnesscheck stayed red forever even after a successful cycle. Adds--source-idas an alias (matches v0.37.7.0 bug: import --source and search --source silently ignore source routing #1167 naming across import/extract/graph-query).judgeSignificanceno longer crashes on emoji-at-offset-4000. The rawslice(0, 4000) + ... + slice(-4000)could split a UTF-16 surrogate pair (🤖, U+1F916) and produce a lone high surrogate; Anthropic JSON-parse rejected with"no low surrogate in string". Routed through the canonicalsafeSplitIndexfromsrc/core/text-safe.ts. Closes the real 2026-05-24 telegram-transcript SYNTH_PHASE_FAIL.Expanded
error_page_titleregex catches Cloudflare/WAF junk titles.Forbidden,Access Denied,Service Unavailable,Robot Check,Verify You Are Human, plus new distinctcloudflare_challenge_titlepattern forJust a moment.... Anchored so legitimate prose passes through.Closes #1559 + #1561. Originals get supersede-comment-and-close.
Why a production-quality rewrite, not direct merge
The PRs' shared commit (
78b93f3, surrogate-safe slicing) introduces asafeSliceEndhelper that re-introduces the case-3 bug that the canonicalsafeSplitIndexintext-safe.tswas written to fix (see module docstring atsrc/core/text-safe.ts:18-21). Also patches a redundant site —findBoundaryin master already routes throughsafeSplitIndex. This PR uses the canonical helper at the one site that genuinely needs it (judgeSignificance).PR #1559's
--max-pagesflag was dropped —CycleOptshas nomaxPagesfield, no cycle phase consults page-count limits, shipping the flag would be a lying flag. Filed in TODOS as v0.41.20.x follow-up TODO-V13-A.PR #1561's bare-
errormatcher was dropped (false-positives on legitimate concept pages titled "Error") and the duplicateerror_page_titlename reuse for the Cloudflare title was given a distinct name (cloudflare_challenge_title) to preserve audit-log diagnosability.Architecture decisions (eng + codex review)
3 user decisions in /plan-eng-review, 18 codex findings in outside-voice review, 11 absorbed inline, 2 substantive cross-model tensions resolved via explicit user AskUserQuestion:
--sourceexits 1 loud (no silent no-op via the no-DB path)gbrain sources restorehint--sourceand--source-idaccepted as aliasesgbrain pages audit-junk-titles) from this PR for ship-and-validate-matchers-first discipline; deferred to v0.41+ follow-up TODO-V13-C with full spec preservedisResolverUserErrorpredicate (TypeError / postgres errors propagate uncaught so genuine programmer bugs aren't hidden behind operator-error UX)Plan + decisions:
~/.claude/plans/system-instruction-you-are-working-starry-papert.mdFiles changed
src/commands/dream.tssrc/core/cycle/synthesize.tssafeSplitIndexinjudgeSignificancehead+tail slicesrc/core/content-sanity.tserror_page_title, addcloudflare_challenge_titletest/dream.test.tstest/dream-cli-flags.test.tstest/content-sanity.test.tstest/import-file-content-sanity.test.tstest/cycle-synthesize.test.tsjudgeSignificance — UTF-16 safetydescribe block; head + tail surrogate boundarytest.eachCHANGELOG.md,VERSION,package.json,TODOS.mdTest plan
bun run verify— 28 checks green (typecheck + privacy + jsonb + progress + wasm + 23 others)Follow-ups filed in TODOS.md
gbrain dream --max-pages <n>plumbing through extract phases--source/--source-idflag-name unification across all CLI commandsgbrain pages audit-junk-titleslegacy-cleanup command (deferred ~1 week for matcher production observation)🤖 Generated with Claude Code
Co-Authored-By: garrytan-agents me@garrytan.com