No batch/incremental entity enrichment primitive — thin person/company pages stay stubs forever

## Symptom

On a 280K-page brain: **only 2,330 of 36,391 person pages (6.4%) had more than 2 content chunks.** The other ~93.6% were stubs — a name, maybe an email, a sentence. Same shape for companies (~18K thin). These pages have rich context *scattered across the brain* (meetings, emails, tweets, calendar, deals) but nothing pulls it onto the entity page.

The only tool that synthesizes scattered context is `think --anchor`, which:
- is Opus-default and heavy/expensive per call,
- is designed for *one* interactive question, not a batch sweep,
- has no built-in prioritization, concurrency, resumability, or 'only thin pages' targeting.

We ended up hand-rolling a SQL query (thin pages ranked by inbound-link count) + a bash fan-out calling `think` 3-at-a-time. It worked (199 high-value pages enriched) but it's exactly the kind of thing that should be a first-class primitive, not operator glue.

## Proposed: `gbrain enrich`

A dedicated batch enrichment command:

```
gbrain enrich [--type person|company|...] [--thin] [--limit N] [--workers K]
              [--order inbound-links|recency|degree] [--model <id>] [--dry-run]
```

- **`--thin`**: target only pages with ≤ N chunks (the stubs), so you don't re-burn tokens on already-rich pages.
- **`--order inbound-links`**: prioritize highest-signal/lowest-content pages first (most-connected stubs = biggest graph payoff per dollar). This was the heuristic that made our manual pass effective.
- **`--workers K`**: built-in bounded concurrency (we ran 3 in parallel by hand).
- **Resumable**: watermark like `embed --stale` / `edges_backfilled_at` so an interrupted run resumes. A `enriched_at` column gated on an enricher version.
- **`--model`**: cost control (we used Sonnet, not Opus default; see the related fail-silent model-override bug).
- **Idempotent + non-destructive**: append/merge synthesized profile into the page, don't clobber human-authored content.

## Stretch: make it part of autopilot

Once `enrich --thin --stale` exists, autopilot can run a slow trickle (e.g. top-50 thinnest-but-most-connected entities per cycle) so the brain *gets smarter over time*, not just *bigger*. Today the observed failure mode (from a network-intelligence digest) was literally: '1,516 new thin pages vs only 567 enriched — brain growing faster than getting smarter.' A trickle enricher in the maintenance loop directly closes that gap.

## Why this matters

This is the 'memory evolution' axis of the published evaluation. `dream`/`autopilot` exist for maintenance, but there's no primitive that turns the brain's *own scattered knowledge* into curated entity profiles at scale. The data is already there; it just needs a batch synthesizer with prioritization + cost control + resumability.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No batch/incremental entity enrichment primitive — thin person/company pages stay stubs forever #1700

Symptom

Proposed: `gbrain enrich`

Stretch: make it part of autopilot

Why this matters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

No batch/incremental entity enrichment primitive — thin person/company pages stay stubs forever #1700

Description

Symptom

Proposed: gbrain enrich

Stretch: make it part of autopilot

Why this matters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Proposed: `gbrain enrich`