Skip to content

extract_facts cycle phase hard-deletes conversation facts (deleteFactsForPage has no source filter; failed-sync fallback turns it into a full-brain wipe) #1928

@spiky02plateau

Description

@spiky02plateau

Summary

The extract_facts cycle phase hard-deletes non-fence facts (e.g. rows written by gbrain extract-conversation-facts) for every page it reconciles. Its own header comment claims these rows survive; the SQL says otherwise. Combined with a failed-sync fallback that turns the phase into a full-brain walk, one autopilot cycle deleted all 1829 facts in our brain (the entire conversation-facts backfill) with factsInserted: 0.

Observed on v0.42.8.0 (the deleting cycle) and the code path is byte-identical through v0.42.26.0 and current master.

The contradiction

src/core/cycle/extract-facts.ts header (lines ~18-22) says:

Pages with no fence go through delete-then-empty-insert — DB rows for that page coordinate are wiped; legacy NULL-source_markdown_slug rows survive because deleteFactsForPage targets source_markdown_slug = slug only.

That protects only rows with source_markdown_slug IS NULL. But extract-conversation-facts deliberately stamps source_markdown_slug with the transcript page slug (page-global row_num contract, src/commands/extract-conversation-facts.ts). So conversation facts live exactly on the coordinate the wipe targets:

// src/core/postgres-engine.ts
async deleteFactsForPage(slug: string, source_id: string): Promise<{ deleted: number }> {
  const result = await sql`
    DELETE FROM facts WHERE source_id = ${source_id} AND source_markdown_slug = ${slug}
  `;
}

No source filter. Since conversation pages carry no ## Facts fence (by design — "the chat-log shape is the source-of-truth"), the reconcile for a transcript page is: delete every fact, reinsert nothing.

Notably, extract-conversation-facts itself already knows how to do this safely — its deleteOrphanFactsForPage scopes its delete with AND source LIKE 'cli:extract-conversation-facts%'. The cycle phase lacks the inverse scoping.

The amplifier: failed sync ⇒ full-brain walk

On an ordinary night the phase only reconciles syncPagesAffected, so transcript pages are rarely touched and the bug stays hidden. But in src/core/cycle.ts (~line 1608):

// If sync didn't run (phases exclude it) or failed, syncPagesAffected
// is undefined → extract falls back to full walk (safe default).

undefined → full walk is a safe default for link extraction; for a destructive wipe-and-reinsert it is the opposite. In our case a standalone sync job and the autopilot cycle started ~40 ms apart; the cycle's sync phase failed on the sync lock ("Another sync is in progress"), syncPagesAffected came back undefined, and extract_facts walked all 1587 pages:

{ "phase": "extract_facts", "status": "ok",
  "details": { "factsDeleted": 1829, "pagesScanned": 1587, "factsInserted": 0, "pagesWithFacts": 0 },
  "summary": "0 fact(s) reconciled across 1587 page(s)" }

Status ok, summary reads like a no-op — 1829 rows gone.

Repro sketch

  1. gbrain extract-conversation-facts over any conversation page → facts rows with source = 'cli:extract-conversation-facts' and source_markdown_slug = <page slug>.
  2. Run a cycle whose sync phase fails (e.g. hold the sync lock), or any full-walk extract_facts.
  3. The conversation facts are deleted; nothing is reinserted.

Suggested fix

Scope the cycle's wipe to fence-owned rows only — e.g. at the call-site in extract-facts.ts, replace deleteFactsForPage with a delete carrying AND source NOT LIKE 'cli:%' (mirroring deleteOrphanFactsForPage's scoping), or add a source-prefix exclusion parameter to deleteFactsForPage. We run this exact call-site guard locally and fence reconciliation (insert/edit/shrink, idempotent re-runs) still byte-matches while cli:% rows survive full walks.

Secondary hardening worth considering: a destructive phase probably shouldn't inherit the "failed sync ⇒ full walk" fallback, and factsDeleted >> factsInserted on a full walk could warrant at least a warn status.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions