fix: dream --source passes sourceId to runCycle, writes cycle timestamp#1559
fix: dream --source passes sourceId to runCycle, writes cycle timestamp#1559garrytan-agents wants to merge 3 commits into
Conversation
The judgeSignificance trimming (slice at 4000 chars) could split a UTF-16 surrogate pair when an emoji sits exactly at the boundary, producing a lone high surrogate that Anthropic's JSON parser rejects with 'no low surrogate in string'. Add safeSliceEnd() helper that backs up by one char when the cut lands between a high and low surrogate. Apply to: - judgeSignificance transcript trimming (the direct cause) - findBoundary hard-split fallback (defense-in-depth) Fixes: dream cycle SYNTH_PHASE_FAIL on 2026-05-24 caused by 🤖 emoji at pos 3999 in telegram/2026-05-20-topic-1-topic-1.md
`gbrain dream --source default` silently ignored the --source flag. The flag was never parsed in parseArgs and never forwarded to runCycle. This meant the cycle completed but last_full_cycle_at was never written to the source's config JSONB, so doctor's cycle_freshness check always reported stale cycles — even when dream ran successfully. Changes: - Parse --source and --max-pages in dream's parseArgs - Forward sourceId and maxPages to runCycle opts - Document both flags in --help Without this fix, only `gbrain autopilot` (which uses its own fanout logic) could write the cycle timestamp. Running `gbrain dream --source X` via cron or manually would never update freshness.
|
Superseded by #1571: same intent, canonical The shared surrogate-safe commit ships via the canonical Thank you for catching the bug. |
…persedes #1559, #1561) (#1571) * fix: dream --source/--source-id plumbs sourceId to runCycle (supersedes #1559) Closes the silent-no-op class where `gbrain dream --source <id>` ran the cycle but never wrote `last_full_cycle_at`, leaving `gbrain doctor`'s cycle_freshness check stuck red forever. Changes to src/commands/dream.ts: - DreamArgs.source field; parseArgs recognizes --source <id> AND the --source-id alias (matches v0.37.7.0 #1167 naming across import/extract/graph-query) - Argv validation: missing value → exit 2; repeated different values → exit 2; --source X --source-id Y conflict → exit 2; same-value repetition → accepted - --help short-circuit ordering preserved with IRON-RULE comment + structural test guard - runDream engine-null guard: --source requires a connected brain - runDream resolveSourceId → archived-source guard via fetchSource from src/core/sources-load.ts (single-row SELECT that projects archived + handles pre-v0.26.5 schema via isUndefinedColumnError) - Typed-error try/catch via isResolverUserError predicate: only swallows known resolver-user errors; TypeError / postgres errors propagate uncaught with stack trace so genuine programmer bugs aren't hidden behind operator-error UX - Forwarded sourceId to runCycle; existing v0.38 writeback at cycle.ts:1947-1967 now actually fires - --help text documents both flag names Tests: - test/dream-cli-flags.test.ts: structural assertions for new flags, help text, IRON-RULE comment guard, resolver/predicate wiring - test/dream.test.ts: 13 PGLite integration cases covering happy path (the regression that closes PR #1559), back-compat, alias equivalence, all argv edge cases, engine-null, archived, --help short-circuit ordering, T3 typed-error propagation, and D5 end-to-end dream→checkCycleFreshness column-name drift guard Plan + 11 decisions: ~/.claude/plans/system-instruction-you-are-working-starry-papert.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: judgeSignificance uses canonical safeSplitIndex (closes #1559/#1561 emoji crash) Closes the 2026-05-24 production SYNTH_PHASE_FAIL: 🤖 (U+1F916, surrogate pair U+D83E U+DD16) at offset 3999 in a long telegram transcript made the raw 4000-char slice produce a lone high surrogate; Anthropic's JSON parser rejected the payload with "no low surrogate in string"; the synthesize phase failed. Changes to src/core/cycle/synthesize.ts: - judgeSignificance head+tail slice routed through safeSplitIndex from src/core/text-safe.ts (already imported) - Did NOT introduce safeSliceEnd from PRs #1559+#1561 — that helper re-introduces the case-3 bug src/core/text-safe.ts:18-21 documents - Did NOT touch findBoundary — master already routes through safeSplitIndex per the v0.42.0.0 wave Tests in test/cycle-synthesize.test.ts: - New describe('judgeSignificance — UTF-16 safety') block - test.each over head boundaries (offsets 3998-4001) AND tail boundaries (offsets 3999-4002) for an 8001-char content with the robot emoji placed at each - Primary assertion: explicit unpaired-surrogate scan over the captured prompt (NOT JSON.stringify per codex C-11 — V8/JSCore do not throw on lone surrogates, so that assertion was weak) - Sub-8000 short-content branch case: no slicing, emoji passes through unchanged Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: expand error_page_title + add cloudflare_challenge_title (supersedes #1561) Closes the bug class where scraper error pages with titles like "Forbidden", "Access Denied", "Service Unavailable", "Robot Check", and "Just a moment..." were slipping through the ingest gate because the matcher only caught bare numeric codes (403/404/500...) and "page not found". 232+ pages observed (202+ from straylight- brain) were inflating page counts and tripping content_sanity_audit_recent on every doctor run. Changes to src/core/content-sanity.ts BUILT_IN_JUNK_PATTERNS: - Expanded error_page_title regex to also catch forbidden, access denied, service unavailable, robot check, verify you are human (case-insensitive, anchored — so long-form essays about these topics still ingest fine) - New cloudflare_challenge_title pattern with DISTINCT name from error_page_title (PR #1561 collapsed both into one name and lost audit signal — the new name preserves diagnosability in ~/.gbrain/audit/content-sanity-YYYY-Www.jsonl and doctor's content_sanity_audit_recent aggregation) - Dropped PR #1561's bare-`error` matcher — too aggressive on legitimate concept/taxonomy pages titled exactly "Error" Tests: - test/content-sanity.test.ts: pattern-count locked at 7, new matches via test.each, over-match regression guard (legitimate prose titled "How to Handle Access Denied Errors" / "Error Boundary in React" etc. must pass), audit-name distinctness pinned - test/import-file-content-sanity.test.ts: end-to-end ContentSanityBlockError via importFromContent for each new pattern family (D6 — assessor wiring coverage, not just regex) Out of scope, filed in TODOS.md as TODO-V13-C: gbrain pages audit-junk-titles legacy-cleanup command. Dropped from this PR per codex outside-voice tension (T1) for ship-and-validate- matchers-first discipline. The 200+ pre-existing scraper pages already in the DB will get the destructive-cleanup operator surface after ~1 week of production observation against this matcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump v0.41.23.0 + CHANGELOG + follow-up TODOs VERSION + package.json bump to 0.41.23.0. CHANGELOG voice: ELI10 lead naming the bug ("`gbrain dream --source <id>` finally counts as a cycle"), then per-fix detail, then a "To take advantage of v0.41.23.0" operator-action block and itemized changes. TODOS.md v0.41.23.x follow-ups: - TODO-V13-A (P2): --max-pages plumbing (PR #1559's flag, deferred because CycleOpts has no maxPages field today) - TODO-V13-B (P3): --source vs --source-id flag-name unification across all CLI commands - TODO-V13-C (P2): gbrain pages audit-junk-titles legacy cleanup (deferred for ~1 week of matcher production observation) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump v0.41.25.0 → v0.41.26.0 (leave headroom for in-flight PR) Master shipped v0.41.23.0 + v0.41.24.0 mid-review; this branch originally bumped to v0.41.25.0 post-merge. User flagged v0.41.26.0 to leave a slot open for another in-flight PR. No code changes; VERSION + package.json + CHANGELOG header + "To take advantage" section updated in lockstep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* upstream/master: v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571) v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566) v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543) v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541) v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562) v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542) v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545) v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544) feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537) v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521) v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519) v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510) v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)
Problem
gbrain dream --source defaultsilently ignores the--sourceflag. The flag was never parsed inparseArgsand never forwarded torunCycle.Impact:
runCyclewriteslast_full_cycle_atto the source config ONLY whenopts.sourceIdis set (line 1961 of cycle.ts). Without it, the cycle completes successfully but the doctor cycle_freshness check always reports stale — because no timestamp was ever written.This means:
gbrain dream --source defaultvia cron → cycle runs → doctor says 60h stale → operator re-runs → cycle runs → still stale → infinite loopgbrain autopilot(which has its own fanout logic) could write the timestampFix
--source <id>inparseArgs(was completely missing)--max-pages <n>(was silently ignored)runCycleopts--helpOne-line root cause:
sourceIdwas inCycleOptsbutdream.tsnever set it.Testing