Skip to content

v0.30.2 feat: dream synthesize stops dropping fat transcripts#754

Merged
garrytan merged 6 commits intomasterfrom
garrytan/synthesize-chunking
May 9, 2026
Merged

v0.30.2 feat: dream synthesize stops dropping fat transcripts#754
garrytan merged 6 commits intomasterfrom
garrytan/synthesize-chunking

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented May 8, 2026

Summary

Dream synthesize stops dropping fat transcripts. Subagents that overflow Anthropic's context die once, not three times. The queue stops clogging.

The v0.30 dream cycle has been stalled for one user since May 2 — daily aggregated transcripts at 2.7-4.5MB each generate 1.7M-token Anthropic prompts, hit the 1M-token hard limit, and 400. The subagent handler treated those failures as renewable, so doomed transcripts stalled three times before dead-lettering and every new cycle re-discovered the same fat transcripts. Six days of synth backlog, queue full of doomed work.

This PR ships:

Chunking + terminal classifysrc/core/cycle/synthesize.ts + src/core/minions/handlers/subagent.ts

  • New pure splitTranscriptByBudget(content, contentHash, maxChars) with 3-tier boundary ladder (## Topic:---\n), seeded with a deterministic offset from contentHash so the same content always chunks identically (D9 stable chunk identity).
  • MODEL_CONTEXT_TOKENS map (Opus 4.7 = 1M, Sonnet 4.6 = 200K, Haiku = 200K) drives a floor(context × 0.9 × 3.5 chars/token) per-chunk budget. Non-Anthropic ids fall back to 180K-token safe default with stderr warning.
  • dream.synthesize.max_prompt_tokens config override (token-shaped, name from PR Fix dream/synthesize pipeline for large transcripts #748, floor 100K) beats the model lookup.
  • dream.synthesize.max_chunks_per_transcript (default 24, operator-configurable). On cap hit, log + skip; do NOT write to dream_verdicts (D5 closes the cache-poisoning class).
  • Anthropic 400 "prompt is too long" responses now rethrow as UnrecoverableError so the worker routes them straight to dead on first attempt — no more 3-stall retry pile.

Orchestrator slug rewrite (D6) — zero Sonnet trust

  • collectChildPutPageSlugs no longer does SELECT DISTINCT (which erased the collision evidence the audit claimed to detect). Raw-fetches every (job_id, slug) pair, then for chunked children rewrites bare-hash6 slugs to <hash6>-c<idx> in-memory.

Migration safety (D8)

  • Pre-fan-out lookup of completed legacy dream:synth:<filePath>:<hash16> jobs. Transcripts already synthesized under the single-chunk shape skip submission with already_synthesized_legacy_single_chunk instead of resubmitting under chunked keys.

Doctor visibilitysrc/commands/doctor.ts

  • queue_health gains subcheck 4 counting dead subagent jobs in the last 24h whose error_text starts with prompt_too_long:. Fix hint points at gbrain dream --phase synthesize --dry-run --json.

Credits PR #748 (Wintermute) for the boundary ladder, config key naming, and 3.5 chars/token estimator. This branch supersedes #748 with the structural safeguards (model-aware budget, terminal-error classify, deterministic chunk identity, slug rewrite, doctor surfacing).

Test Coverage

[+] src/core/cycle/synthesize.ts
  ├── splitTranscriptByBudget()
  │   ├── [★★★ TESTED] single-chunk pass-through, throws on maxChars<=0
  │   ├── [★★★ TESTED] Tier 1 (## Topic:), Tier 2 (---), Tier 3 (\n)
  │   ├── [★★★ TESTED] hard-fallback for paragraphless content (3 chunks of 500)
  │   └── [★★★ TESTED] hash-determinism (same hash → same chunks; different hash → may differ but valid; non-hex hash → offset 0)
  ├── rewriteChunkedSlug() (D6)
  │   ├── [★★★ TESTED] 7 shapes: bare hash6, already-suffixed, conflict, no-match, exact-equal, /-segment, empty
  ├── computeChunkCharBudget()
  │   └── [★★  TESTED] via E2E (Sonnet 200K + 100K override)
  ├── hasLegacySingleChunkCompletion() (D8)
  │   └── [★★★ TESTED] E2E pre-seeds completed row → skip
  └── fan-out loop
      ├── [★★★ TESTED] D5 cap hit (5 chunks > cap=2) → skip + no jobs + no dream_verdicts write
      ├── [★★★ TESTED] D8 legacy-key match → skip with reason
      ├── [★★★ TESTED] single-chunk → legacy key shape
      └── [★★★ TESTED] multi-chunk → :c<i>of<n> suffix on every key, indices 0..N-1

[+] src/core/minions/handlers/subagent.ts
  └── isPromptTooLongError()
      ├── [★★★ TESTED] production wording verbatim
      ├── [★★★ TESTED] case-insensitive
      ├── [★★★ TESTED] nested .error.message shape
      ├── [★★★ TESTED] defensive 400 + invalid_request_error / request_too_large
      ├── [★★★ TESTED] negative cases (malformed JSON 400, 500, 429, transient errors)
      └── [★★★ TESTED] null/undefined/non-error inputs

COVERAGE: 31 tests pass / 0 fail / 77 expect() calls
QUALITY: ★★★ across the board (behavior + edges + error paths)

Tests: 27 unit + 4 PGLite E2E = 31 new cases. All green.

Pre-Landing Review

The plan went through /plan-eng-review (4 issues found, 0 critical gaps, all resolved as D1-D4) plus two rounds of /codex outside-voice review:

  • Round 1: 12 substantive findings (4 plan-breaking, 8 spec-tightening).
  • Round 2: 6 STILL OPEN, 5 PARTIAL, 3 NEW internal inconsistencies. The 6 STILL OPEN routed through AskUserQuestion and resolved as D5–D10 in the plan.

Implementation matches the resolved plan. bun run verify (8 shell pre-checks + typecheck) clean. No new pre-landing review issues found.

Eval Results

No prompt-related files changed — evals skipped.

Plan Completion

10/10 decisions implemented:

  • D1 model-aware chunk budget (MODEL_CONTEXT_TOKENS × 0.9 × 3.5 chars/token)
  • D2 prompt-trust slug seed with audit safety net (folded into D6 implementation)
  • D3 doctor queue_health extension
  • D4 gbrain jobs prune --status dead documented in CHANGELOG
  • D5 cap-hit log+skip, no dream_verdicts write
  • D6 orchestrator slug rewrite, zero Sonnet trust
  • D7 tool-loop turn-N growth deferred to v0.30.3+ (named in NOT in scope)
  • D8 single→multi-chunk migration via legacy-key lookup
  • D9 stable chunk identity via hash-deterministic boundaries
  • D10 24-chunk cap operator-configurable

Verification Results

No dev server in this repo — plan verification skipped (this is a CLI / library, not a web app).

TODOS

No items in TODOS.md were completed by this branch. The two named follow-ups (per-turn token-budget guard in subagent.ts, tighter token estimator for dense content) are documented in the CHANGELOG's ### Out of scope (deferred to v0.30.3+) section.

Documentation

  • README.md: dream help block corrected from 8-phase to 9-phase (the cycle has been 9-phase since v0.29) and extended with a one-liner about the v0.30.2 fat-transcript chunker plus the two operator config keys (dream.synthesize.max_prompt_tokens, dream.synthesize.max_chunks_per_transcript).
  • CLAUDE.md: three Key Files entries extended in-place to keep the per-file annotations terse and verb-first.
    • src/core/cycle/synthesize.ts: notes splitTranscriptByBudget(content, contentHash, maxChars) (deterministic-offset chunker on the ## Topic:---\n ladder), the MODEL_CONTEXT_TOKENS × 0.9 × 3.5 chars/token budget formula with 180K-token fallback for non-Anthropic ids, per-chunk idempotency keys (dream:synth:<filePath>:<hash16>:c<i>of<n>), the legacy single-chunk key preservation (D8) so existing brains skip with already_synthesized_legacy_single_chunk, the orchestrator-side <hash6>-c<idx> slug rewrite in collectChildPutPageSlugs (D6, zero Sonnet trust), and the explicit D7 scope (initial prompt size only; turn-N caught by the new subagent.ts terminal classifier).
    • src/core/minions/handlers/subagent.ts: notes that Anthropic 400 prompt is too long responses now classify as UnrecoverableError so the job goes straight to dead on first attempt instead of stalling three times.
    • src/commands/doctor.ts: notes queue_health gains a fourth subcheck — dead-lettered subagent jobs whose error_text starts with prompt_too_long: within the last 24h, with fix hints pointing at gbrain dream --phase synthesize --dry-run --json (identify the offender) and gbrain jobs prune --status dead --queue default (clean up).
  • CHANGELOG.md: reviewed for voice + CLAUDE.md compliance. The v0.30.2 entry leads with capability, names real numbers (4.5MB transcript, 7-8 children, 630KB-per-chunk default, 1M-token Anthropic limit), credits PR Fix dream/synthesize pipeline for large transcripts #748, separates contributor-facing detail, includes a To take advantage of v0.30.2 block.
  • VERSION: bumped to 0.30.2 by the metadata commit. package.json synced.

Commit: 6943f421 (docs: update README + CLAUDE.md for v0.30.2).

Test plan

  • Unit tests pass: 27 cases across test/cycle-synthesize-chunker.test.ts + test/subagent-prompt-too-long.test.ts
  • E2E tests pass: 4 PGLite cases in test/e2e/dream-synthesize-chunking.test.ts
  • Existing synth tests not regressed: 49 cases across test/cycle-synthesize.test.ts + test/e2e/dream-synthesize-pglite.test.ts + test/e2e/dream-allow-list-pglite.test.ts
  • bun run typecheck clean
  • bun run verify clean (privacy, jsonb-pattern, progress-stdout, test-isolation, wasm-embedded, admin-build, admin-scope-drift, cli-executable, typecheck)
  • Live regression on user's brain: gbrain dream --phase synthesize --dry-run --json — confirm transcripts that previously generated 1.7M-token prompts now report chunks: N instead of submitting (run after merge)
  • Queue cleanup post-deploy: gbrain jobs prune --status dead --queue default to clear pre-v0.30.2 doomed jobs

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

garrytan and others added 6 commits May 8, 2026 16:02
The subagent handler now detects 400 "prompt is too long" responses
from the Anthropic SDK and rethrows as UnrecoverableError. The worker
already routes UnrecoverableError straight to `dead`, so doomed jobs
fail terminally on first attempt instead of stalling 3x with the same
oversized prompt.

isPromptTooLongError matches the production message verbatim
("prompt is too long: N tokens > N maximum"), case-insensitive, on
both the outer message and inner error.message paths. Defensive
secondary match for status=400 + invalid_request_error/request_too_large
with the words "too long"/"exceed"/"maximum".

9 unit cases pin the detection: production wording, case folding,
nested SDK shape, defensive 400 paths, unrelated 400s, transient
errors, null/empty inputs.
The synthesize phase now chunks oversized transcripts at paragraph
boundaries instead of submitting one giant prompt that 400s on
Anthropic. Closes the v0.30 dream-cycle queue clog where 1.7M-token
transcripts dead-lettered after 3 stalls and re-discovered every
cycle.

D1: per-chunk budget = floor(model_context_tokens × 0.9 × 3.5).
MODEL_CONTEXT_TOKENS keys on resolved Anthropic ids (Opus 4.7 = 1M,
Sonnet 4.6 = 200K, Haiku = 200K). Non-Anthropic models fall back to
180K-token safe default with a once-per-process stderr warning.
dream.synthesize.max_prompt_tokens overrides the model lookup
(token-shaped, name from PR #748, floor 100K).

D5: on max_chunks_per_transcript cap hit, log + skip; do NOT write to
dream_verdicts. Closes the cache-poisoning class — next cycle
re-attempts under whatever budget is then current.

D6: orchestrator-side deterministic slug rewrite, zero Sonnet trust.
collectChildPutPageSlugs raw-fetches every (job_id, slug) pair (no
SELECT DISTINCT — that erased the collision evidence the audit
claimed to detect) and rewrites bare-hash6 slugs to <hash6>-c<idx>
for chunked children.

D8: pre-fan-out lookup of completed legacy `dream:synth:<filePath>:
<hash16>` jobs. Transcripts already synthesized under the
single-chunk shape skip submission with `already_synthesized_legacy_
single_chunk` instead of resubmitting under chunked keys.

D9: hash-deterministic chunk boundaries. The 3-tier ladder lifted
from PR #748 (## Topic: > --- > nearest \\n) is fed a back-half
search-window offset derived from contentHash. Same content always
chunks identically across runs; chunk N of a previously-failed
transcript produces byte-identical content on retry.

D10: 24-chunk default cap, operator-configurable via
dream.synthesize.max_chunks_per_transcript.

18 unit cases pin the chunker (boundary ladder, hash determinism,
hard fallback, slug rewrite all 7 shapes). 4 PGLite E2E cases pin
fan-out shape (single-chunk legacy key parity, multi-chunk chunked
key shape) + skip paths (D5 cap hit no verdict-cache write, D8
legacy-key skip).

Credits PR #748 (Wintermute) for the boundary ladder, config key
naming, and 3.5 chars/token estimator. This branch supersedes #748
with the structural safeguards (model-aware budget, terminal-error
classify, slug rewrite, hash-determinism, doctor surfacing).
queue_health gains a 4th subcheck counting dead `subagent` jobs in
the last 24h whose error_text starts with `prompt_too_long:`. When
present, prints a fix hint pointing at
`gbrain dream --phase synthesize --dry-run --json` to identify the
fat transcripts and naming the two operator escape hatches
(`dream.synthesize.max_prompt_tokens` for budget tuning,
larger-context model for capacity).

Operators now see the chunking failure mode without grepping
minion_jobs by hand.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README dream help: 8-phase → 9-phase, mention v0.30.2 chunking + config keys
- CLAUDE.md synthesize.ts: chunker + per-chunk idempotency + D6 slug rewrite + D7 scope + D8 legacy-key
- CLAUDE.md subagent.ts: prompt_too_long terminal classification
- CLAUDE.md doctor.ts: queue_health subcheck 4 (dead-lettered prompt_too_long)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The docs/ pass extended three Key Files entries in CLAUDE.md
(synthesize.ts, subagent.ts, doctor.ts). The auto-derived
llms-full.txt bundle picks up those CLAUDE.md changes via
build-llms; the build-llms test caught the drift in CI.

Generated by: bun run build:llms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant