fix: preserve source id during sync imports#639
fix: preserve source id during sync imports#639100yenadmin wants to merge 2 commits intogarrytan:masterfrom
Conversation
Thread optional sourceId through sync/import/import-file and scope page, chunk, tag, and version writes by source identity instead of bare slug. Default callers still target source_id='default'. Adds regression coverage for named-source full sync and import. Validation: bun run verify; HOME=/var/folders/l7/tyvn8qfs50v57w__75s3k0f00000gp/T/tmp.UhjtVbjvxm bun run test (3831 pass, 0 fail); focused sync/import/multi-source tests.
There was a problem hiding this comment.
Pull request overview
This PR fixes multi-source sync/import behavior by threading an optional sourceId through the import/write pipeline and scoping page/chunk/tag/version writes and lookups by (source_id, slug) so named-source syncs don’t accidentally write into default.
Changes:
- Thread
sourceIdthroughperformSync→runImport()/importFile()and throughimportFromContent()/importCodeFile(). - Update both Postgres and PGLite engines to scope page CRUD, chunk CRUD, tag reconciliation, and version creation by
source_id. - Add sync/import regression tests to ensure named-source runs write pages (and associated chunks/tags/versions) into the requested source.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/sync.test.ts | Adds regression tests asserting named-source sync/import writes into the requested source, not default. |
| src/core/types.ts | Extends core types with PageInput.source_id and updates GetPageOpts documentation. |
| src/core/engine.ts | Extends the engine interface to accept optional sourceId on several write paths (notably createVersion). |
| src/core/import-file.ts | Threads sourceId through import logic for page/chunk/tag/version operations. |
| src/commands/sync.ts | Passes sourceId through sync deletes and import calls so sync is source-scoped. |
| src/commands/import.ts | Adds sourceId support to runImport() and forwards it into importFile(). |
| src/core/postgres-engine.ts | Scopes page/chunk/tag/version operations by source_id in the Postgres engine. |
| src/core/pglite-engine.ts | Scopes page/chunk/tag/version operations by source_id in the PGLite engine. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot please re-review this PR with the latest changes. @codesmith please re-check this PR against the latest commits and refreshed PR description. |
|
Hi @100yenadmin! Codesmith requires write access to this repository. You currently have read-only access to |
Removed PR garrytan#438 write-through patch. Using upstream put_page (DB-only) + dream cycle for link/timeline materialization per Garry's intended architecture. The write-through created double-write conflicts with dream's reverseWriteSlugs(). Fork now carries only PR garrytan#639 (source_id routing for sync/import). Updated PATCHES.md and validate-patches.sh accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes #540.
Supersedes #541, which was based on the old
garrytan/embedding-providersbranch and now conflicts with currentmaster.Plain-English Summary
GBrain now supports multiple named sources. That only works if every page, chunk, tag, and version is written back to the same source it came from.
Before this fix,
gbrain sync --source docscould correctly remember that it was syncingdocs, but the import/write path could still reconcile content by slug alone. If two sources both hadREADME.md, the named source could accidentally touchdefault. That is data bleed, and it makes multi-source brains unsafe for real teams.This PR makes source identity part of the write contract.
Maintainer Notes
sourceIdfrom full sync and incremental sync intorunImport()/importFile().sourceIdthroughimportFromContent()and code-file import.(source_id, slug)in both PGLite and Postgres engines.sourceIdstill targetsource_id='default'.sourceId !== undefined, matching the existing regression expectations.getVersions()andrevertToVersion()can target the same source-aware identity as writes.Human Walkthrough
If your company has two knowledge folders with the same file names, GBrain should not guess which one you meant. This change makes the folder/source name travel with the document all the way into storage, history, tags, and search chunks.
Review-Fix Pass
Latest review-fix commit:
415ee45.Fixed Copilot review findings around:
sourceIdhandling;GetPageOptsdocs;Validation
git diff --checkbun run verifybun test test/sync.test.ts test/import-file.test.ts test/multi-source-integration.test.tsbun test test/sync.test.ts test/pglite-engine.test.ts->128 pass, 0 failHOME=$(mktemp -d) bun run test->3831 pass, 0 failNotes
The first full local test run without HOME isolation hit this machine's real
~/.gbrain, causing an unrelatedBrainRegistrytest to resolve the host brain instead of failing as the test expects. Re-running with a clean HOME matches CI/test isolation and passes.Need help on this PR? Tag
@codesmithwith what you need.