Symptom
On a 280K-page brain: only 2,330 of 36,391 person pages (6.4%) had more than 2 content chunks. The other ~93.6% were stubs — a name, maybe an email, a sentence. Same shape for companies (~18K thin). These pages have rich context scattered across the brain (meetings, emails, tweets, calendar, deals) but nothing pulls it onto the entity page.
The only tool that synthesizes scattered context is think --anchor, which:
- is Opus-default and heavy/expensive per call,
- is designed for one interactive question, not a batch sweep,
- has no built-in prioritization, concurrency, resumability, or 'only thin pages' targeting.
We ended up hand-rolling a SQL query (thin pages ranked by inbound-link count) + a bash fan-out calling think 3-at-a-time. It worked (199 high-value pages enriched) but it's exactly the kind of thing that should be a first-class primitive, not operator glue.
Proposed: gbrain enrich
A dedicated batch enrichment command:
gbrain enrich [--type person|company|...] [--thin] [--limit N] [--workers K]
[--order inbound-links|recency|degree] [--model <id>] [--dry-run]
--thin: target only pages with ≤ N chunks (the stubs), so you don't re-burn tokens on already-rich pages.
--order inbound-links: prioritize highest-signal/lowest-content pages first (most-connected stubs = biggest graph payoff per dollar). This was the heuristic that made our manual pass effective.
--workers K: built-in bounded concurrency (we ran 3 in parallel by hand).
- Resumable: watermark like
embed --stale / edges_backfilled_at so an interrupted run resumes. A enriched_at column gated on an enricher version.
--model: cost control (we used Sonnet, not Opus default; see the related fail-silent model-override bug).
- Idempotent + non-destructive: append/merge synthesized profile into the page, don't clobber human-authored content.
Stretch: make it part of autopilot
Once enrich --thin --stale exists, autopilot can run a slow trickle (e.g. top-50 thinnest-but-most-connected entities per cycle) so the brain gets smarter over time, not just bigger. Today the observed failure mode (from a network-intelligence digest) was literally: '1,516 new thin pages vs only 567 enriched — brain growing faster than getting smarter.' A trickle enricher in the maintenance loop directly closes that gap.
Why this matters
This is the 'memory evolution' axis of the published evaluation. dream/autopilot exist for maintenance, but there's no primitive that turns the brain's own scattered knowledge into curated entity profiles at scale. The data is already there; it just needs a batch synthesizer with prioritization + cost control + resumability.
Symptom
On a 280K-page brain: only 2,330 of 36,391 person pages (6.4%) had more than 2 content chunks. The other ~93.6% were stubs — a name, maybe an email, a sentence. Same shape for companies (~18K thin). These pages have rich context scattered across the brain (meetings, emails, tweets, calendar, deals) but nothing pulls it onto the entity page.
The only tool that synthesizes scattered context is
think --anchor, which:We ended up hand-rolling a SQL query (thin pages ranked by inbound-link count) + a bash fan-out calling
think3-at-a-time. It worked (199 high-value pages enriched) but it's exactly the kind of thing that should be a first-class primitive, not operator glue.Proposed:
gbrain enrichA dedicated batch enrichment command:
--thin: target only pages with ≤ N chunks (the stubs), so you don't re-burn tokens on already-rich pages.--order inbound-links: prioritize highest-signal/lowest-content pages first (most-connected stubs = biggest graph payoff per dollar). This was the heuristic that made our manual pass effective.--workers K: built-in bounded concurrency (we ran 3 in parallel by hand).embed --stale/edges_backfilled_atso an interrupted run resumes. Aenriched_atcolumn gated on an enricher version.--model: cost control (we used Sonnet, not Opus default; see the related fail-silent model-override bug).Stretch: make it part of autopilot
Once
enrich --thin --staleexists, autopilot can run a slow trickle (e.g. top-50 thinnest-but-most-connected entities per cycle) so the brain gets smarter over time, not just bigger. Today the observed failure mode (from a network-intelligence digest) was literally: '1,516 new thin pages vs only 567 enriched — brain growing faster than getting smarter.' A trickle enricher in the maintenance loop directly closes that gap.Why this matters
This is the 'memory evolution' axis of the published evaluation.
dream/autopilotexist for maintenance, but there's no primitive that turns the brain's own scattered knowledge into curated entity profiles at scale. The data is already there; it just needs a batch synthesizer with prioritization + cost control + resumability.