Skip to content

v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes #1559, #1561)#1571

Merged
garrytan merged 7 commits into
masterfrom
garrytan/dream-source-ingest-titles
May 27, 2026
Merged

v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes #1559, #1561)#1571
garrytan merged 7 commits into
masterfrom
garrytan/dream-source-ingest-titles

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

Production-quality fix wave that absorbs and supersedes community PRs #1559 and #1561 (both by @garrytan-agents). Ships three coordinated fixes as four atomic commits:

  1. gbrain dream --source <id> finally writes last_full_cycle_at. The flag was parsed but never forwarded to runCycle, so doctor's cycle_freshness check stayed red forever even after a successful cycle. Adds --source-id as an alias (matches v0.37.7.0 bug: import --source and search --source silently ignore source routing #1167 naming across import/extract/graph-query).

  2. judgeSignificance no longer crashes on emoji-at-offset-4000. The raw slice(0, 4000) + ... + slice(-4000) could split a UTF-16 surrogate pair (🤖, U+1F916) and produce a lone high surrogate; Anthropic JSON-parse rejected with "no low surrogate in string". Routed through the canonical safeSplitIndex from src/core/text-safe.ts. Closes the real 2026-05-24 telegram-transcript SYNTH_PHASE_FAIL.

  3. Expanded error_page_title regex catches Cloudflare/WAF junk titles. Forbidden, Access Denied, Service Unavailable, Robot Check, Verify You Are Human, plus new distinct cloudflare_challenge_title pattern for Just a moment.... Anchored so legitimate prose passes through.

Closes #1559 + #1561. Originals get supersede-comment-and-close.

Why a production-quality rewrite, not direct merge

The PRs' shared commit (78b93f3, surrogate-safe slicing) introduces a safeSliceEnd helper that re-introduces the case-3 bug that the canonical safeSplitIndex in text-safe.ts was written to fix (see module docstring at src/core/text-safe.ts:18-21). Also patches a redundant site — findBoundary in master already routes through safeSplitIndex. This PR uses the canonical helper at the one site that genuinely needs it (judgeSignificance).

PR #1559's --max-pages flag was dropped — CycleOpts has no maxPages field, no cycle phase consults page-count limits, shipping the flag would be a lying flag. Filed in TODOS as v0.41.20.x follow-up TODO-V13-A.

PR #1561's bare-error matcher was dropped (false-positives on legitimate concept pages titled "Error") and the duplicate error_page_title name reuse for the Cloudflare title was given a distinct name (cloudflare_challenge_title) to preserve audit-log diagnosability.

Architecture decisions (eng + codex review)

3 user decisions in /plan-eng-review, 18 codex findings in outside-voice review, 11 absorbed inline, 2 substantive cross-model tensions resolved via explicit user AskUserQuestion:

  • D1: engine-null + --source exits 1 loud (no silent no-op via the no-DB path)
  • D2: archived sources refused with paste-ready gbrain sources restore hint
  • D3: both --source and --source-id accepted as aliases
  • T1 (codex outside-voice tension): dropped a Fix 4 destructive cleanup command (gbrain pages audit-junk-titles) from this PR for ship-and-validate-matchers-first discipline; deferred to v0.41+ follow-up TODO-V13-C with full spec preserved
  • T3: tightened try/catch to typed resolver-user errors only via isResolverUserError predicate (TypeError / postgres errors propagate uncaught so genuine programmer bugs aren't hidden behind operator-error UX)

Plan + decisions: ~/.claude/plans/system-instruction-you-are-working-starry-papert.md

Files changed

File Why LOC
src/commands/dream.ts argv parsing, engine-null guard, archived-source guard, typed-error try/catch, resolveSourceId + fetchSource wiring, help text +146
src/core/cycle/synthesize.ts safeSplitIndex in judgeSignificance head+tail slice +5/-3
src/core/content-sanity.ts Expand error_page_title, add cloudflare_challenge_title +12/-1
test/dream.test.ts 13 PGLite integration cases (back-compat regression, alias, repetition, engine-null, archived, --help short-circuit, T3 TypeError propagation, D5 end-to-end dream→checkCycleFreshness) +310
test/dream-cli-flags.test.ts structural assertions for new flags + IRON-RULE comment guard +47
test/content-sanity.test.ts New patterns + over-match regression guard + audit-name distinctness +122
test/import-file-content-sanity.test.ts importFromContent end-to-end for each new pattern family +48
test/cycle-synthesize.test.ts judgeSignificance — UTF-16 safety describe block; head + tail surrogate boundary test.each +124
CHANGELOG.md, VERSION, package.json, TODOS.md v0.41.25.0 bump (master claimed v0.41.23 + v0.41.24 mid-review) + 3 follow-up TODOs +281

Test plan

  • bun run verify — 28 checks green (typecheck + privacy + jsonb + progress + wasm + 23 others)
  • Targeted suite (4 files, 138 tests) — all pass in 2.13s post-merge
  • Branch rebased-equivalent on master via merge commit; trio (VERSION/package.json/CHANGELOG) consistent at 0.41.25.0
  • Manual behavioral smoke after merge:
    gbrain dream --source default --json | jq '.status'  # expect ok/clean/partial
    gbrain doctor --json | jq '.checks[] | select(.name=="cycle_freshness")'  # status=ok
    gbrain dream --source nonsense   # exit 1 + sources-list hint
    gbrain dream --source            # exit 2 + usage hint
    gbrain dream --help --source whatever  # help + exit 0 (back-compat regression)
    gbrain dream                     # legacy back-compat: no per-source writeback

Follow-ups filed in TODOS.md

  • TODO-V13-A: gbrain dream --max-pages <n> plumbing through extract phases
  • TODO-V13-B: --source / --source-id flag-name unification across all CLI commands
  • TODO-V13-C: gbrain pages audit-junk-titles legacy-cleanup command (deferred ~1 week for matcher production observation)

🤖 Generated with Claude Code

Co-Authored-By: garrytan-agents me@garrytan.com

garrytan and others added 5 commits May 27, 2026 08:35
#1559)

Closes the silent-no-op class where `gbrain dream --source <id>` ran
the cycle but never wrote `last_full_cycle_at`, leaving
`gbrain doctor`'s cycle_freshness check stuck red forever.

Changes to src/commands/dream.ts:
- DreamArgs.source field; parseArgs recognizes --source <id> AND the
  --source-id alias (matches v0.37.7.0 #1167 naming across
  import/extract/graph-query)
- Argv validation: missing value → exit 2; repeated different values
  → exit 2; --source X --source-id Y conflict → exit 2; same-value
  repetition → accepted
- --help short-circuit ordering preserved with IRON-RULE comment +
  structural test guard
- runDream engine-null guard: --source requires a connected brain
- runDream resolveSourceId → archived-source guard via fetchSource
  from src/core/sources-load.ts (single-row SELECT that projects
  archived + handles pre-v0.26.5 schema via isUndefinedColumnError)
- Typed-error try/catch via isResolverUserError predicate: only
  swallows known resolver-user errors; TypeError / postgres errors
  propagate uncaught with stack trace so genuine programmer bugs
  aren't hidden behind operator-error UX
- Forwarded sourceId to runCycle; existing v0.38 writeback at
  cycle.ts:1947-1967 now actually fires
- --help text documents both flag names

Tests:
- test/dream-cli-flags.test.ts: structural assertions for new flags,
  help text, IRON-RULE comment guard, resolver/predicate wiring
- test/dream.test.ts: 13 PGLite integration cases covering happy
  path (the regression that closes PR #1559), back-compat, alias
  equivalence, all argv edge cases, engine-null, archived,
  --help short-circuit ordering, T3 typed-error propagation, and
  D5 end-to-end dream→checkCycleFreshness column-name drift guard

Plan + 11 decisions: ~/.claude/plans/system-instruction-you-are-working-starry-papert.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
 emoji crash)

Closes the 2026-05-24 production SYNTH_PHASE_FAIL: 🤖 (U+1F916,
surrogate pair U+D83E U+DD16) at offset 3999 in a long telegram
transcript made the raw 4000-char slice produce a lone high
surrogate; Anthropic's JSON parser rejected the payload with "no
low surrogate in string"; the synthesize phase failed.

Changes to src/core/cycle/synthesize.ts:
- judgeSignificance head+tail slice routed through safeSplitIndex
  from src/core/text-safe.ts (already imported)
- Did NOT introduce safeSliceEnd from PRs #1559+#1561 — that helper
  re-introduces the case-3 bug src/core/text-safe.ts:18-21 documents
- Did NOT touch findBoundary — master already routes through
  safeSplitIndex per the v0.42.0.0 wave

Tests in test/cycle-synthesize.test.ts:
- New describe('judgeSignificance — UTF-16 safety') block
- test.each over head boundaries (offsets 3998-4001) AND tail
  boundaries (offsets 3999-4002) for an 8001-char content with
  the robot emoji placed at each
- Primary assertion: explicit unpaired-surrogate scan over the
  captured prompt (NOT JSON.stringify per codex C-11 — V8/JSCore
  do not throw on lone surrogates, so that assertion was weak)
- Sub-8000 short-content branch case: no slicing, emoji passes
  through unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…edes #1561)

Closes the bug class where scraper error pages with titles like
"Forbidden", "Access Denied", "Service Unavailable", "Robot Check",
and "Just a moment..." were slipping through the ingest gate
because the matcher only caught bare numeric codes (403/404/500...)
and "page not found". 232+ pages observed (202+ from straylight-
brain) were inflating page counts and tripping
content_sanity_audit_recent on every doctor run.

Changes to src/core/content-sanity.ts BUILT_IN_JUNK_PATTERNS:
- Expanded error_page_title regex to also catch forbidden,
  access denied, service unavailable, robot check, verify you are
  human (case-insensitive, anchored — so long-form essays about
  these topics still ingest fine)
- New cloudflare_challenge_title pattern with DISTINCT name from
  error_page_title (PR #1561 collapsed both into one name and lost
  audit signal — the new name preserves diagnosability in
  ~/.gbrain/audit/content-sanity-YYYY-Www.jsonl and doctor's
  content_sanity_audit_recent aggregation)
- Dropped PR #1561's bare-`error` matcher — too aggressive on
  legitimate concept/taxonomy pages titled exactly "Error"

Tests:
- test/content-sanity.test.ts: pattern-count locked at 7, new
  matches via test.each, over-match regression guard (legitimate
  prose titled "How to Handle Access Denied Errors" / "Error
  Boundary in React" etc. must pass), audit-name distinctness
  pinned
- test/import-file-content-sanity.test.ts: end-to-end
  ContentSanityBlockError via importFromContent for each new
  pattern family (D6 — assessor wiring coverage, not just regex)

Out of scope, filed in TODOS.md as TODO-V13-C: gbrain pages
audit-junk-titles legacy-cleanup command. Dropped from this PR
per codex outside-voice tension (T1) for ship-and-validate-
matchers-first discipline. The 200+ pre-existing scraper pages
already in the DB will get the destructive-cleanup operator
surface after ~1 week of production observation against this
matcher.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VERSION + package.json bump to 0.41.23.0.

CHANGELOG voice: ELI10 lead naming the bug ("`gbrain dream --source
<id>` finally counts as a cycle"), then per-fix detail, then a
"To take advantage of v0.41.23.0" operator-action block and itemized
changes.

TODOS.md v0.41.23.x follow-ups:
- TODO-V13-A (P2): --max-pages plumbing (PR #1559's flag, deferred
  because CycleOpts has no maxPages field today)
- TODO-V13-B (P3): --source vs --source-id flag-name unification
  across all CLI commands
- TODO-V13-C (P2): gbrain pages audit-junk-titles legacy cleanup
  (deferred for ~1 week of matcher production observation)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce-ingest-titles

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
Master shipped v0.41.23.0 + v0.41.24.0 mid-review; this branch
originally bumped to v0.41.25.0 post-merge. User flagged v0.41.26.0
to leave a slot open for another in-flight PR. No code changes;
VERSION + package.json + CHANGELOG header + "To take advantage"
section updated in lockstep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title v0.41.25.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes #1559, #1561) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes #1559, #1561) May 27, 2026
…ce-ingest-titles

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
@garrytan garrytan merged commit 42d99b6 into master May 27, 2026
3 checks passed
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master:
  v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572)
  v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571)
  v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566)
  v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543)
  v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541)
  v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562)
  v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542)
  v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545)
  v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544)
  feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537)
  v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521)
  v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519)
  v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510)
  v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)
garrytan added a commit that referenced this pull request May 29, 2026
… drain + disconnect audit (closes #1570) (#1608)

* merge master: rebump v0.41.25.0 → v0.41.27.0 (queue collision)

Master shipped v0.41.25.0 (#1538 batched sync deletes) and v0.41.26.0
(#1571 dream --source fix) while this branch was in flight. Conflict
resolution rebumps to the next available slot.

- VERSION: 0.41.25.0 → 0.41.27.0
- package.json: synced
- CHANGELOG.md: my v0.41.27.0 entry placed above master's v0.41.26.0
  and v0.41.25.0; in-entry version references updated 0.41.25.0 →
  0.41.27.0 and forward-references bumped to v0.41.28+.
- TODOS.md: kept master's v0.41.20.x section + my v0.41.27.0+ follow-ups

No source-file conflicts during the merge.

* feat(diagnostics): db-disconnect audit + doctor surface (v0.41.27.0)

Instruments every db.disconnect() and PostgresEngine.disconnect() call
with a JSONL audit record so the next user-reported #1570 cycle gives
us the offender's caller stack instead of the symptomatic
"No database connection" error.

Audit shape (~/.gbrain/audit/db-disconnect-YYYY-Www.jsonl):
  {ts, engine_kind, connection_style, caller_stack[], command, pid}

- src/core/audit/db-disconnect-audit.ts (NEW): the audit writer,
  built on the v0.40.4.0 createAuditWriter cathedral. Captures a
  6-frame stack via new Error().stack so the offender is readable
  without spending stderr noise.
- src/core/db.ts: logDbDisconnect call at the top of disconnect()
  (best-effort; never blocks the real teardown).
- src/core/postgres-engine.ts: same instrumentation in
  PostgresEngine.disconnect() — distinguishes 'module' vs 'instance'
  connection_style so we can tell legitimate worker-pool teardowns
  apart from the load-bearing module-singleton class.
- src/commands/doctor.ts: extends batch_retry_health to surface
  24h disconnect count + most-recent caller stack. Warns when the
  caller frame isn't a known CLI-exit frame (e.g. cli.ts's finally
  block at the end of an op-dispatch). This is the diagnostic that
  tells v0.41.28+ where to apply the real ownership fix.
- test/db-disconnect-audit.test.ts: unit coverage for the audit
  writer + caller-stack capture + JSONL shape.
- test/e2e/db-singleton-shared-recovery.test.ts: real-Postgres
  regression that exercises the singleton-null path end-to-end.

Refs #1570

* feat(retry): self-heal on null singleton — closes #1570 symptom (v0.41.27.0)

withRetry gains an opt-in reconnect callback that fires between the
isRetryableConnError classification and the inter-attempt sleep.
PostgresEngine.batchRetry injects this.reconnect() — race-safe via
the existing _reconnecting guard, handles module and instance pools.

Closes the production loss reported in #1570: dream cycles on Supabase
no longer drop ~150 link rows per cycle when the singleton goes null
mid-batch. The retry now rebuilds the connection between attempts so
the second try has somewhere to write to.

- src/core/retry.ts: WithRetryOpts gains `reconnect?: () => Promise<void>`.
  Awaited in the catch branch. onRetry is also now awaited (back-compat-
  safe: every existing in-tree caller is a sync arrow). Reconnect
  failures propagate as the real cause — replaces the symptomatic
  "No database connection" error with whatever the connect() throw
  was, so operators see the truth.
- src/core/postgres-engine.ts:batchRetry — injects
  `reconnect: () => this.reconnect()`. Covers all 9 batch-retry call
  sites (addLinksBatch, addTimelineEntriesBatch, upsertChunks, plus
  the 6 caller-supplied auditSite labels in extract / sync / reindex).
- test/core/retry-reconnect.test.ts: 8 hermetic cases pinning the
  contract — reconnect fires before sleep, only on retryable errors,
  back-compat when omitted, signal-aborted bypasses reconnect,
  onRetry is awaited, full success path end-to-end.

The deeper bug (who's calling disconnect mid-cycle) is left
unaddressed in this commit by design — the diagnostic instrumentation
in the prior commit will tell us in the next production run.

Refs #1570

* feat(facts): drainPending() + CLI await before disconnect (v0.41.27.0)

Closes the silent 'No database connection' tail-end errors after
gbrain capture / put_page: the facts:absorb fire-and-forget queue
sometimes outlived the CLI process's connection lifetime, so absorb
attempts after engine.disconnect() landed in stderr as the
GBrainError shape.

- src/core/facts/queue.ts: new drainPending({timeout: 1000}) method
  distinct from shutdown(). Stops accepting new enqueues, awaits
  in-flight settle, bounded by timeout, returns count of unfinished.
  Semantically different from shutdown() (which aborts in-flight)
  so the symptom — drop work that hasn't started yet but let
  in-flight work finish — matches what CLI exit actually needs.
- src/cli.ts: op-dispatch finally block awaits the drain BEFORE
  engine.disconnect(). Bounded 1s. Opt-out env GBRAIN_NO_FACTS_DRAIN
  for callers that don't enqueue (keeps fast-exit paths fast).
  Mirrors the v0.41.8.0 awaitPendingLastRetrievedWrites pattern.
- test/facts-queue-drain-pending.test.ts: 6 hermetic cases — empty
  drain returns immediately, single in-flight settles, timeout
  bounds wait, shutdown-after-drain is idempotent, post-drain
  enqueues are dropped, signal-aborted skips waiting.

Refs #1570

* docs: update project documentation for v0.41.27.0

README.md: added troubleshooting entry for the v0.41.27.0 retry-reconnect
+ facts:absorb drain fix (closes #1570), pointing operators at
`gbrain doctor --json` to find the offending disconnect caller.

CLAUDE.md: extended `src/core/retry.ts` entry with the new optional
`reconnect` callback (v0.41.27.0); added two new Key Files entries for
`src/core/audit/db-disconnect-audit.ts` (the diagnostic half of the
"instrument first, fix later" pivot) and `FactsQueue.drainPending`;
extended `doctor.ts:checkBatchRetryHealth` entry with the in-place
extension that surfaces 24h disconnect-call count.

llms-full.txt: regenerated to absorb CLAUDE.md edits.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: rebump v0.41.27.0 → v0.41.28.0 (queue collision with #1573)

Master shipped v0.41.27.0 (#1573 git-aware sync_freshness) claiming the
same slot. Rebump to the next available version.

- VERSION + package.json → 0.41.28.0
- CHANGELOG.md: my entry header + in-entry refs 0.41.27.0 → 0.41.28.0
- TODOS.md: my #1570 follow-up section header + body refs bumped

* test: pin gateway in put-page-provenance + embedding-dim-check (CI shard fix)

Both files failed on CI shards 1 and 8 under the cross-file gateway-state
leak class (CLAUDE.md "Test-isolation lint and helpers"). The v0.41.28.0
merge reshuffled the weight-based shard bin-packing, landing a
gateway-mutating sibling ahead of these two victims in the same `bun test`
process.

Mechanism:
- put-page-provenance: put_page embeds via the gateway. A sibling left
  the gateway configured with OpenAI + the CI placeholder `sk-test`
  (captured at configureGateway time, survives the withEnv restore as
  cached gateway state). put_page's embed then fired against live OpenAI
  and 401'd. The bunfig legacy-embedding preload's beforeEach only
  re-applies legacy when the gateway was RESET — it does NOT correct a
  sibling that configured a different LIVE config.
- embedding-dim-check: initSchema builds the content_chunks vector column
  at the gateway's configured dim. A sibling leaking ZE/1280 made the
  column 1280-d, so `expect(dims).toBe(1536)` failed.

Fix (victim-side pinning, the escape hatch the preload documents):
- Both: configure the gateway explicitly in beforeAll BEFORE initSchema
  (OpenAI/1536), resetGateway() in afterAll so neither leaks onward.
- put-page-provenance also stubs the embed transport via
  __setEmbedTransportForTests so embed is deterministic and offline; a
  dummy OPENAI_API_KEY is supplied in the gateway env because
  instantiateEmbedding builds the OpenAI client (key check) BEFORE the
  stubbed transport is reached — the stub then intercepts the actual
  call so the key never leaves the process.

Verified: CI shards 1 (1337 pass) + 8 (905 pass) green with
OPENAI_API_KEY unset, plus adversarial sibling orderings (gateway.test /
doctor-ze-checks preceding). Typecheck + check-test-isolation clean.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant