Skip to content

fix(facts): cycle reconcile preserves cli-sourced facts (#1928)#1967

Open
TreyLawrence wants to merge 1 commit into
garrytan:masterfrom
TreyLawrence:trey/fix-cycle-deletes-cli-facts
Open

fix(facts): cycle reconcile preserves cli-sourced facts (#1928)#1967
TreyLawrence wants to merge 1 commit into
garrytan:masterfrom
TreyLawrence:trey/fix-cycle-deletes-cli-facts

Conversation

@TreyLawrence

Copy link
Copy Markdown

Summary

Closes #1928. Data-loss class bug per the issue (one cycle wiped 1829 conversation-facts rows in the reporter's brain).

The extract_facts cycle phase wiped every fact row on each reconciled page's (source_id, source_markdown_slug) coordinate. CLI deposits — notably gbrain extract-conversation-facts, which writes rows with source LIKE 'cli:extract-conversation-facts%' against transcript pages that carry no ## Facts fence by design — live on that exact coordinate. So a reconcile for a transcript page was: delete every fact, reinsert nothing.

Combined with the failed-sync ⇒ full-walk fallback in cycle.ts (syncPagesAffected = undefined → walk every page), one autopilot cycle could wipe an entire conversation-facts backfill while reporting status: 'ok' and 0 fact(s) reconciled across N page(s).

Fix

Add an optional excludeSourcePrefixes option to deleteFactsForPage. The cycle phase passes ['cli:'] so CLI-deposited rows survive the wipe.

Files

  • src/core/engine.ts — interface: optional opts?: { excludeSourcePrefixes?: string[] } on deleteFactsForPage. JSDoc updated.
  • src/core/postgres-engine.ts + src/core/pglite-engine.ts — engine parity. Both implementations short-circuit to the existing single-statement DELETE when no prefixes are passed (back-compat) and add AND NOT (source LIKE ANY($N::text[])) when prefixes are present.
  • src/core/cycle/extract-facts.ts — call site passes excludeSourcePrefixes: ['cli:']. File header updated to document the exclusion.

The reporter's suggested-fix paragraph from the issue mentions running this exact call-site guard locally and confirming fence reconciliation still byte-matches.

Non-goals (issue's "secondary hardening" — not in this PR)

The issue suggests two additional hardenings worth considering separately:

  1. Destructive phase shouldn't inherit the "failed-sync ⇒ full-walk" fallback. That's a cycle.ts change at a different layer and worth its own PR — the fix here makes the data loss impossible regardless of whether the fallback runs, so it's strictly additive.
  2. factsDeleted >> factsInserted should warn status. Useful canary, but signal-only — also strictly additive on top of this fix.

Kept the scope tight to the data-loss fix; happy to follow up on either if you'd like.

Test plan

test/extract-facts-phase.test.ts gains a new #1928 CLI-deposited facts survive the wipe describe with five tests:

Result:

18 pass / 0 fail in extract-facts-phase.test.ts
96 pass / 0 fail across extract-facts-phase, insert-facts-batch, phantom-redirect, extract-conversation-facts
  • bun run typecheck clean

Diff: 5 files, +211 / -8.

The extract_facts cycle phase wiped every fact row on each reconciled
page's (source_id, source_markdown_slug) coordinate. CLI deposits —
notably `gbrain extract-conversation-facts`, which writes rows with
`source LIKE 'cli:extract-conversation-facts%'` against transcript
pages that carry no `## Facts` fence by design — live on that exact
coordinate. So a reconcile for a transcript page was: delete every
fact, reinsert nothing.

Combined with the failed-sync ⇒ full-walk fallback in cycle.ts
(syncPagesAffected undefined → walk every page), one autopilot cycle
could wipe an entire conversation-facts backfill while reporting
status: 'ok' / `0 fact(s) reconciled across N page(s)`.

Fix: add an optional `excludeSourcePrefixes` option to
`deleteFactsForPage`. The cycle phase passes `['cli:']` so CLI-deposited
rows survive the wipe. Engine parity in both postgres-engine and
pglite-engine (`AND NOT (source LIKE ANY($N::text[]))`).

The header comment in src/core/cycle/extract-facts.ts is updated to
document the new exclusion.

Regression coverage in test/extract-facts-phase.test.ts under a new
`garrytan#1928 CLI-deposited facts survive the wipe` describe:
- CLI-sourced fact on a fenceless page survives slug-scoped reconcile
- Full-walk fallback (no slugs) does not wipe CLI deposits brain-wide
- Non-cli rows on the same coordinate ARE still wiped (reconcile
  semantics preserved for sync:import / mcp:put_page fence rows)
- Engine-level pin: deleteFactsForPage honors the option directly
- Back-compat: no opts deletes every coord row (pre-garrytan#1928 behavior)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant