Summary
gbrain dream's orphans phase only detects orphan PAGES (pages with no inbound links). It does NOT detect orphan CHILD ROWS in FK-bound tables like content_chunks, page_versions, tags, takes, etc. Bulk page deletions can leave orphans in these tables that persist indefinitely.
Discovered after a bulk source removal on v0.35.0.0 left 234 orphan content_chunks rows. All FK constraints to pages.id are correctly declared as ON DELETE CASCADE (verified via pg_constraint against the live PGLite DB), so the orphans must have come from one of:
A. A historical schema version where the FK was missing (and never retro-cascaded)
B. A bulk-delete code path that bypassed cascade (manual SQL UPDATE without follow-up CASCADE)
C. A race condition during a large CASCADE batch
Regardless of origin, the existing orphans phase doesn't catch them. The find_orphans MCP op also only scans pages.
Repro / Detection
bun -e "
import { PGlite } from '@electric-sql/pglite';
import { vector } from '@electric-sql/pglite/vector';
const db = new PGlite('/Users/<you>/.gbrain/brain.pglite', { extensions: { vector } });
await db.waitReady;
for (const tbl of ['content_chunks', 'page_versions', 'tags', 'takes', 'links', 'raw_data', 'files', 'timeline_entries']) {
const col = tbl === 'links' ? 'from_page_id' : 'page_id';
const r = await db.query(\`SELECT COUNT(*) AS c FROM \${tbl} WHERE \${col} IS NOT NULL AND \${col} NOT IN (SELECT id FROM pages)\`);
console.log(\`\${tbl}: \${r.rows[0].c} orphans\`);
}
"
If any table reports non-zero orphans, the existing orphans phase didn't catch them.
Suggested fix shapes
A. Extend the existing orphans phase to scan all FK-to-pages tables. Could add to runPhaseOrphans in src/core/cycle.ts; would surface in autopilot output naturally.
B. New find_child_orphans MCP op (admin-scoped, localOnly) returning a per-table breakdown. Operators can run on-demand without waiting for the next dream cycle.
C. New doctor check (child_table_orphans) — fast, sampled, surfaces non-zero counts with paste-ready cleanup SQL. Cheapest to implement; matches the existing doctor health-check pattern.
I'd vote (C) — it's the smallest blast radius, gives operators immediate signal, and doesn't change autopilot semantics.
Context
- Live FK constraints are correct (
pg_get_constraintdef shows ON DELETE CASCADE on every page-referencing table including content_chunks).
- 234 orphans were cleaned with
DELETE FROM content_chunks WHERE page_id NOT IN (SELECT id FROM pages) — manual operator intervention.
- Pre-cleanup orphan check could have been auto-fired by the dream cycle but wasn't.
Severity: medium. Doesn't break anything visible (orphans don't poison search since the join eliminates them), but they bloat storage and embedding-index size proportionally to historical deletes. On a brain with months of churn, this can be hundreds of MB of dead vectors.
Summary
gbrain dream'sorphansphase only detects orphan PAGES (pages with no inbound links). It does NOT detect orphan CHILD ROWS in FK-bound tables likecontent_chunks,page_versions,tags,takes, etc. Bulk page deletions can leave orphans in these tables that persist indefinitely.Discovered after a bulk source removal on v0.35.0.0 left 234 orphan
content_chunksrows. All FK constraints topages.idare correctly declared asON DELETE CASCADE(verified viapg_constraintagainst the live PGLite DB), so the orphans must have come from one of:A. A historical schema version where the FK was missing (and never retro-cascaded)
B. A bulk-delete code path that bypassed cascade (manual SQL UPDATE without follow-up CASCADE)
C. A race condition during a large CASCADE batch
Regardless of origin, the existing
orphansphase doesn't catch them. Thefind_orphansMCP op also only scans pages.Repro / Detection
If any table reports non-zero orphans, the existing
orphansphase didn't catch them.Suggested fix shapes
A. Extend the existing
orphansphase to scan all FK-to-pages tables. Could add torunPhaseOrphansinsrc/core/cycle.ts; would surface in autopilot output naturally.B. New
find_child_orphansMCP op (admin-scoped, localOnly) returning a per-table breakdown. Operators can run on-demand without waiting for the next dream cycle.C. New
doctorcheck (child_table_orphans) — fast, sampled, surfaces non-zero counts with paste-ready cleanup SQL. Cheapest to implement; matches the existing doctor health-check pattern.I'd vote (C) — it's the smallest blast radius, gives operators immediate signal, and doesn't change autopilot semantics.
Context
pg_get_constraintdefshowsON DELETE CASCADEon every page-referencing table includingcontent_chunks).DELETE FROM content_chunks WHERE page_id NOT IN (SELECT id FROM pages)— manual operator intervention.Severity: medium. Doesn't break anything visible (orphans don't poison search since the join eliminates them), but they bloat storage and embedding-index size proportionally to historical deletes. On a brain with months of churn, this can be hundreds of MB of dead vectors.