Skip to content

v0.42.6.0 feat(enrich): gbrain enrich --thin — brain-internal grounded synthesis for stub pages (#1700)#1757

Merged
garrytan merged 6 commits into
masterfrom
garrytan/enrich-thin-batch
Jun 2, 2026
Merged

v0.42.6.0 feat(enrich): gbrain enrich --thin — brain-internal grounded synthesis for stub pages (#1700)#1757
garrytan merged 6 commits into
masterfrom
garrytan/enrich-thin-batch

Conversation

@garrytan

@garrytan garrytan commented Jun 2, 2026

Copy link
Copy Markdown
Owner

What this ships

gbrain enrich --thin develops stub (thin) people/company pages at scale by consolidating what the brain already knows about an entity into one cited page — no web lookups. gbrain's own model tooling only sees brain-internal context (search / get_page / facts / backlinks), so enrich does brain-internal grounded synthesis: deterministically retrieve everything scattered about an entity (meetings, other pages, deals, facts, raw notes), then one grounded gateway.chat call per page writes a real dossier with [Source: slug] citations. When the brain knows too little, it skips instead of fabricating. Web research stays the agent-driven enrich SKILL's job.

gbrain enrich --thin --dry-run --json                       # preview + cost estimate, no spend
gbrain enrich --thin --limit 3 --max-usd 0.50 --model anthropic:claude-haiku-4-5

Resumable (--resume), budget-capped (--max-usd; best-effort under --workers>1, pin --workers 1 for a hard ceiling), source-scoped (--source), backgroundable (--background). Enriched pages stamp enriched_at/enriched_by so a recency guard skips them next run. Opt-in autopilot phase enrich_thin (default OFF) trickles a few pages/source/cycle with per-source + brain-wide cost/walltime caps.

Commits (whole branch vs master)

  • feat(engine)listEnrichCandidates source-aware candidate selection (pg + pglite parity): thin-filter + per-page source-correct inbound count + enriched_at recency guard + whitelisted ORDER BY + LIMIT, lightweight projection (no bodies).
  • feat(enrich)gbrain enrich --thin core: runEnrichCore/enrichOne, retrieve→ground→synthesize→put_page write-through, op-checkpoint resume, BudgetTracker cap, per-page advisory lock, CLI + thin-client refuse + Minion handler. Includes the codex-review fixes (below).
  • feat(cycle) — opt-in enrich_thin autopilot phase with enforced per-source cost cap.
  • merge origin/master (0.42.2.0) — resolved 4 conflicts; cycle phase count is now 22 (both enrich_thin and skillopt landed; both sides had independently claimed 21).
  • chore — VERSION/CHANGELOG/CLAUDE.md/llms → 0.42.6.0.

Codex review (implementation) — FAIL→addressed

A fresh /codex review of the implementation (the prior codex pass was plan-stage only) found 4 P1 + 2 P2, all verified against the code. Folded in:

  • P1 injection escapesanitizeContext neutralizes the <context>…</context> data-envelope delimiters so an untrusted retrieved note can't close the envelope and inject instructions (mirrors the existing </trajectory> convention).
  • P1 background idempotency — multi-source --background key now carries the run fingerprint (backgroundIdempotencyKey); a re-run with different --model/--limit/--force enqueues new work instead of returning a stale completed job.
  • P1 budget honestybudget_exhausted is flagged post-hoc when tracker.totalSpent > cap even if the gateway swallowed the final-call throw (new read-only BudgetTracker.cap getter; no gateway.ts change).
  • P2 checkpoint flushbody() flushes the checkpoint on BudgetExhausted before it propagates, so resume doesn't re-charge completed pages.
  • P2 per-source cap — cycle phase enforces min(per-source, brain-wide remaining) instead of leaving the per-source cap parsed-but-ignored.
  • Accepted (D5): in-flight gateway.chat isn't cancelled on budget abort (documented best-effort overshoot ~1 call/worker; a true fix needs a shared runSlidingPool API change). Code comment added.

Verification

  • bun run typecheck clean; bun run verify 29/29.
  • Full unit suite green except 7 pre-existing env-coupled tests (assert keyless-fallback behavior; all 4 files pass 34/34 in a clean keyless env = CI-equivalent; none touch the enrich diff).
  • enrich e2e enrich-pglite 15/15 × 3 runs (the prior flaky budget test now carries the repo-standard 30s timeout); listEnrichCandidates pg↔pglite parity confirmed on real Postgres.

Reviews: ENG cleared (plan), CODEX FAIL absorbed (5 fixes + 1 accepted). All fixtures use placeholder names.

🤖 Generated with Claude Code

garrytan and others added 6 commits June 1, 2026 19:10
…1700)

One SQL query per engine: thin-filter + per-page source-correct inbound-link
count (to_page_id = p.id, mentions excluded) + enriched_at recency guard +
whitelisted ORDER BY (ENRICH_ORDER_SQL) + LIMIT, returning a lightweight
projection (no page bodies). EnrichCandidate/EnrichCandidatesOpts types.
pg + pglite parity, pinned by engine-parity.test.ts.
…1700)

Develops stub pages at scale by consolidating scattered brain knowledge (search
+ backlinks + facts + raw_data) into one grounded gateway.chat call per page.
Resumable (op-checkpoint), budget-capped (best-effort under --workers), per-page
advisory lock, put_page write-through. CLI + thin-client refuse + Minion handler.

Includes codex-review fixes: sanitizeContext neutralizes the <context> envelope
delimiters (P1 injection escape); background fan-out idempotency key carries the
run fingerprint (P1); post-hoc budget-overage flag via new BudgetTracker.cap
getter (P1); checkpoint flush on budget exhaustion (P2). Accepts the documented
best-effort in-flight-cancel limitation (D5) with an explicit code note.
Default-OFF trickle around runEnrichCore: develops a few thin pages per source
per tick so the brain compounds over time. Per-source cap enforced as
min(per-source, brain-wide remaining) with brain-wide total + walltime caps
(P2 fix: per-source max_cost_usd was parsed but never enforced). Wired into
CyclePhase / ALL_PHASES / PHASE_SCOPE / NEEDS_LOCK + dispatch.
…n-batch

# Conflicts:
#	src/cli.ts
#	src/core/cycle.ts
#	test/core/cycle.serial.test.ts
#	test/phase-scope-coverage.test.ts
gbrain enrich --thin batch enrichment (#1700) + codex-review fixes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n-batch

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
@garrytan garrytan merged commit 662a6e2 into master Jun 2, 2026
21 checks passed
mgunnin added a commit to mgunnin/gbrain that referenced this pull request Jun 3, 2026
* upstream/master:
  v0.42.8.0 feat: content-quality gate on sync — quarantine junk + flag boilerplate (garrytan#1699) (garrytan#1756)
  v0.42.7.0 feat(extract): link/timeline extraction freshness watermark — gbrain extract --stale + doctor lag check (garrytan#1696) (garrytan#1755)
  v0.42.6.0 feat(enrich): gbrain enrich --thin — brain-internal grounded synthesis for stub pages (garrytan#1700) (garrytan#1757)
  v0.42.5.0 fix(minions): RSS watchdog opacity + pooler-reap self-heal + silent lens backlog + cycle lint DB-disconnect (garrytan#1678) (garrytan#1735)
  v0.42.4.0 fix: think --model fails loud — slash-form ids + never persist empty synthesis (garrytan#1698) (garrytan#1736)
  v0.42.3.0 feat(search): autocut — score-discontinuity result-sizing (garrytan#1663 wave 1) (garrytan#1682)
  v0.42.2.0 feat: gbrain connect — one-command Claude Code onboarding from a bearer token (garrytan#1683)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant