fix(facts): cycle reconcile preserves cli-sourced facts (#1928)#1967
Open
TreyLawrence wants to merge 1 commit into
Open
fix(facts): cycle reconcile preserves cli-sourced facts (#1928)#1967TreyLawrence wants to merge 1 commit into
TreyLawrence wants to merge 1 commit into
Conversation
The extract_facts cycle phase wiped every fact row on each reconciled page's (source_id, source_markdown_slug) coordinate. CLI deposits — notably `gbrain extract-conversation-facts`, which writes rows with `source LIKE 'cli:extract-conversation-facts%'` against transcript pages that carry no `## Facts` fence by design — live on that exact coordinate. So a reconcile for a transcript page was: delete every fact, reinsert nothing. Combined with the failed-sync ⇒ full-walk fallback in cycle.ts (syncPagesAffected undefined → walk every page), one autopilot cycle could wipe an entire conversation-facts backfill while reporting status: 'ok' / `0 fact(s) reconciled across N page(s)`. Fix: add an optional `excludeSourcePrefixes` option to `deleteFactsForPage`. The cycle phase passes `['cli:']` so CLI-deposited rows survive the wipe. Engine parity in both postgres-engine and pglite-engine (`AND NOT (source LIKE ANY($N::text[]))`). The header comment in src/core/cycle/extract-facts.ts is updated to document the new exclusion. Regression coverage in test/extract-facts-phase.test.ts under a new `garrytan#1928 CLI-deposited facts survive the wipe` describe: - CLI-sourced fact on a fenceless page survives slug-scoped reconcile - Full-walk fallback (no slugs) does not wipe CLI deposits brain-wide - Non-cli rows on the same coordinate ARE still wiped (reconcile semantics preserved for sync:import / mcp:put_page fence rows) - Engine-level pin: deleteFactsForPage honors the option directly - Back-compat: no opts deletes every coord row (pre-garrytan#1928 behavior) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #1928. Data-loss class bug per the issue (one cycle wiped 1829 conversation-facts rows in the reporter's brain).
The
extract_factscycle phase wiped every fact row on each reconciled page's(source_id, source_markdown_slug)coordinate. CLI deposits — notablygbrain extract-conversation-facts, which writes rows withsource LIKE 'cli:extract-conversation-facts%'against transcript pages that carry no## Factsfence by design — live on that exact coordinate. So a reconcile for a transcript page was: delete every fact, reinsert nothing.Combined with the failed-sync ⇒ full-walk fallback in
cycle.ts(syncPagesAffected = undefined→ walk every page), one autopilot cycle could wipe an entire conversation-facts backfill while reportingstatus: 'ok'and0 fact(s) reconciled across N page(s).Fix
Add an optional
excludeSourcePrefixesoption todeleteFactsForPage. The cycle phase passes['cli:']so CLI-deposited rows survive the wipe.Files
src/core/engine.ts— interface: optionalopts?: { excludeSourcePrefixes?: string[] }ondeleteFactsForPage. JSDoc updated.src/core/postgres-engine.ts+src/core/pglite-engine.ts— engine parity. Both implementations short-circuit to the existing single-statement DELETE when no prefixes are passed (back-compat) and addAND NOT (source LIKE ANY($N::text[]))when prefixes are present.src/core/cycle/extract-facts.ts— call site passesexcludeSourcePrefixes: ['cli:']. File header updated to document the exclusion.The reporter's suggested-fix paragraph from the issue mentions running this exact call-site guard locally and confirming fence reconciliation still byte-matches.
Non-goals (issue's "secondary hardening" — not in this PR)
The issue suggests two additional hardenings worth considering separately:
cycle.tschange at a different layer and worth its own PR — the fix here makes the data loss impossible regardless of whether the fallback runs, so it's strictly additive.factsDeleted >> factsInsertedshouldwarnstatus. Useful canary, but signal-only — also strictly additive on top of this fix.Kept the scope tight to the data-loss fix; happy to follow up on either if you'd like.
Test plan
test/extract-facts-phase.test.tsgains a new#1928 CLI-deposited facts survive the wipedescribe with five tests:factsDeleted: 0)slugsfilter) does NOT wipe CLI deposits brain-wide (the "amplifier" scenario from the issue)(source_id, source_markdown_slug)coordinate ARE still wiped (reconcile semantics preserved forsync:import/mcp:put_pagefence rows)deleteFactsForPagehonors the option directly (1 deleted, 1 cli row survives)Result:
bun run typecheckcleanDiff: 5 files, +211 / -8.