Skip to content

fix: dream --source passes sourceId to runCycle, writes cycle timestamp#1559

Closed
garrytan-agents wants to merge 3 commits into
garrytan:masterfrom
garrytan-agents:fix/dream-source-id
Closed

fix: dream --source passes sourceId to runCycle, writes cycle timestamp#1559
garrytan-agents wants to merge 3 commits into
garrytan:masterfrom
garrytan-agents:fix/dream-source-id

Conversation

@garrytan-agents

Copy link
Copy Markdown
Contributor

Problem

gbrain dream --source default silently ignores the --source flag. The flag was never parsed in parseArgs and never forwarded to runCycle.

Impact: runCycle writes last_full_cycle_at to the source config ONLY when opts.sourceId is set (line 1961 of cycle.ts). Without it, the cycle completes successfully but the doctor cycle_freshness check always reports stale — because no timestamp was ever written.

This means:

  • gbrain dream --source default via cron → cycle runs → doctor says 60h stale → operator re-runs → cycle runs → still stale → infinite loop
  • Only gbrain autopilot (which has its own fanout logic) could write the timestamp
  • Every manual or cron-triggered dream run was invisible to doctor

Fix

  1. Parse --source <id> in parseArgs (was completely missing)
  2. Parse --max-pages <n> (was silently ignored)
  3. Forward both to runCycle opts
  4. Document in --help

One-line root cause: sourceId was in CycleOpts but dream.ts never set it.

Testing

# Before: cycle completes, no timestamp written
gbrain dream --source default --json | jq .status  # "ok"
# sources.config still shows old last_full_cycle_at

# After: cycle completes, timestamp written
gbrain dream --source default --json | jq .status  # "ok"
# sources.config.last_full_cycle_at = now
# doctor cycle_freshness check passes

root and others added 3 commits May 24, 2026 09:16
The judgeSignificance trimming (slice at 4000 chars) could split a
UTF-16 surrogate pair when an emoji sits exactly at the boundary,
producing a lone high surrogate that Anthropic's JSON parser rejects
with 'no low surrogate in string'.

Add safeSliceEnd() helper that backs up by one char when the cut lands
between a high and low surrogate. Apply to:
- judgeSignificance transcript trimming (the direct cause)
- findBoundary hard-split fallback (defense-in-depth)

Fixes: dream cycle SYNTH_PHASE_FAIL on 2026-05-24 caused by
🤖 emoji at pos 3999 in telegram/2026-05-20-topic-1-topic-1.md
`gbrain dream --source default` silently ignored the --source flag.
The flag was never parsed in parseArgs and never forwarded to runCycle.
This meant the cycle completed but last_full_cycle_at was never written
to the source's config JSONB, so doctor's cycle_freshness check always
reported stale cycles — even when dream ran successfully.

Changes:
- Parse --source and --max-pages in dream's parseArgs
- Forward sourceId and maxPages to runCycle opts
- Document both flags in --help

Without this fix, only `gbrain autopilot` (which uses its own fanout
logic) could write the cycle timestamp. Running `gbrain dream --source X`
via cron or manually would never update freshness.
@garrytan

Copy link
Copy Markdown
Owner

Superseded by #1571: same intent, canonical resolveSourceId + fetchSource wiring, + engine-null and archived-source guards, + typed-error try/catch via predicate (so TypeError doesn't get hidden as operator-error), + 13 PGLite integration cases (back-compat, alias equivalence, repetition rules, --help short-circuit ordering, D5 end-to-end dream→checkCycleFreshness column-name drift guard). Drops the silently-dead --max-pages flag (cycle phases don't honor it today — shipping the flag would have been a lying flag). The --max-pages plumbing is filed as v0.41+ TODO-V13-A.

The shared surrogate-safe commit ships via the canonical safeSplitIndex helper instead of safeSliceEnd — see src/core/text-safe.ts module docstring for why the canonical helper exists (the case-3 bug the agent-authored safeSliceEnd re-introduces).

Thank you for catching the bug.

@garrytan garrytan closed this May 27, 2026
garrytan added a commit that referenced this pull request May 27, 2026
…persedes #1559, #1561) (#1571)

* fix: dream --source/--source-id plumbs sourceId to runCycle (supersedes #1559)

Closes the silent-no-op class where `gbrain dream --source <id>` ran
the cycle but never wrote `last_full_cycle_at`, leaving
`gbrain doctor`'s cycle_freshness check stuck red forever.

Changes to src/commands/dream.ts:
- DreamArgs.source field; parseArgs recognizes --source <id> AND the
  --source-id alias (matches v0.37.7.0 #1167 naming across
  import/extract/graph-query)
- Argv validation: missing value → exit 2; repeated different values
  → exit 2; --source X --source-id Y conflict → exit 2; same-value
  repetition → accepted
- --help short-circuit ordering preserved with IRON-RULE comment +
  structural test guard
- runDream engine-null guard: --source requires a connected brain
- runDream resolveSourceId → archived-source guard via fetchSource
  from src/core/sources-load.ts (single-row SELECT that projects
  archived + handles pre-v0.26.5 schema via isUndefinedColumnError)
- Typed-error try/catch via isResolverUserError predicate: only
  swallows known resolver-user errors; TypeError / postgres errors
  propagate uncaught with stack trace so genuine programmer bugs
  aren't hidden behind operator-error UX
- Forwarded sourceId to runCycle; existing v0.38 writeback at
  cycle.ts:1947-1967 now actually fires
- --help text documents both flag names

Tests:
- test/dream-cli-flags.test.ts: structural assertions for new flags,
  help text, IRON-RULE comment guard, resolver/predicate wiring
- test/dream.test.ts: 13 PGLite integration cases covering happy
  path (the regression that closes PR #1559), back-compat, alias
  equivalence, all argv edge cases, engine-null, archived,
  --help short-circuit ordering, T3 typed-error propagation, and
  D5 end-to-end dream→checkCycleFreshness column-name drift guard

Plan + 11 decisions: ~/.claude/plans/system-instruction-you-are-working-starry-papert.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: judgeSignificance uses canonical safeSplitIndex (closes #1559/#1561 emoji crash)

Closes the 2026-05-24 production SYNTH_PHASE_FAIL: 🤖 (U+1F916,
surrogate pair U+D83E U+DD16) at offset 3999 in a long telegram
transcript made the raw 4000-char slice produce a lone high
surrogate; Anthropic's JSON parser rejected the payload with "no
low surrogate in string"; the synthesize phase failed.

Changes to src/core/cycle/synthesize.ts:
- judgeSignificance head+tail slice routed through safeSplitIndex
  from src/core/text-safe.ts (already imported)
- Did NOT introduce safeSliceEnd from PRs #1559+#1561 — that helper
  re-introduces the case-3 bug src/core/text-safe.ts:18-21 documents
- Did NOT touch findBoundary — master already routes through
  safeSplitIndex per the v0.42.0.0 wave

Tests in test/cycle-synthesize.test.ts:
- New describe('judgeSignificance — UTF-16 safety') block
- test.each over head boundaries (offsets 3998-4001) AND tail
  boundaries (offsets 3999-4002) for an 8001-char content with
  the robot emoji placed at each
- Primary assertion: explicit unpaired-surrogate scan over the
  captured prompt (NOT JSON.stringify per codex C-11 — V8/JSCore
  do not throw on lone surrogates, so that assertion was weak)
- Sub-8000 short-content branch case: no slicing, emoji passes
  through unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: expand error_page_title + add cloudflare_challenge_title (supersedes #1561)

Closes the bug class where scraper error pages with titles like
"Forbidden", "Access Denied", "Service Unavailable", "Robot Check",
and "Just a moment..." were slipping through the ingest gate
because the matcher only caught bare numeric codes (403/404/500...)
and "page not found". 232+ pages observed (202+ from straylight-
brain) were inflating page counts and tripping
content_sanity_audit_recent on every doctor run.

Changes to src/core/content-sanity.ts BUILT_IN_JUNK_PATTERNS:
- Expanded error_page_title regex to also catch forbidden,
  access denied, service unavailable, robot check, verify you are
  human (case-insensitive, anchored — so long-form essays about
  these topics still ingest fine)
- New cloudflare_challenge_title pattern with DISTINCT name from
  error_page_title (PR #1561 collapsed both into one name and lost
  audit signal — the new name preserves diagnosability in
  ~/.gbrain/audit/content-sanity-YYYY-Www.jsonl and doctor's
  content_sanity_audit_recent aggregation)
- Dropped PR #1561's bare-`error` matcher — too aggressive on
  legitimate concept/taxonomy pages titled exactly "Error"

Tests:
- test/content-sanity.test.ts: pattern-count locked at 7, new
  matches via test.each, over-match regression guard (legitimate
  prose titled "How to Handle Access Denied Errors" / "Error
  Boundary in React" etc. must pass), audit-name distinctness
  pinned
- test/import-file-content-sanity.test.ts: end-to-end
  ContentSanityBlockError via importFromContent for each new
  pattern family (D6 — assessor wiring coverage, not just regex)

Out of scope, filed in TODOS.md as TODO-V13-C: gbrain pages
audit-junk-titles legacy-cleanup command. Dropped from this PR
per codex outside-voice tension (T1) for ship-and-validate-
matchers-first discipline. The 200+ pre-existing scraper pages
already in the DB will get the destructive-cleanup operator
surface after ~1 week of production observation against this
matcher.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump v0.41.23.0 + CHANGELOG + follow-up TODOs

VERSION + package.json bump to 0.41.23.0.

CHANGELOG voice: ELI10 lead naming the bug ("`gbrain dream --source
<id>` finally counts as a cycle"), then per-fix detail, then a
"To take advantage of v0.41.23.0" operator-action block and itemized
changes.

TODOS.md v0.41.23.x follow-ups:
- TODO-V13-A (P2): --max-pages plumbing (PR #1559's flag, deferred
  because CycleOpts has no maxPages field today)
- TODO-V13-B (P3): --source vs --source-id flag-name unification
  across all CLI commands
- TODO-V13-C (P2): gbrain pages audit-junk-titles legacy cleanup
  (deferred for ~1 week of matcher production observation)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump v0.41.25.0 → v0.41.26.0 (leave headroom for in-flight PR)

Master shipped v0.41.23.0 + v0.41.24.0 mid-review; this branch
originally bumped to v0.41.25.0 post-merge. User flagged v0.41.26.0
to leave a slot open for another in-flight PR. No code changes;
VERSION + package.json + CHANGELOG header + "To take advantage"
section updated in lockstep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master:
  v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572)
  v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571)
  v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566)
  v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543)
  v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541)
  v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562)
  v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542)
  v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545)
  v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544)
  feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537)
  v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521)
  v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519)
  v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510)
  v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants