Skip to content

fix: preserve source id during sync imports#639

Closed
100yenadmin wants to merge 2 commits intogarrytan:masterfrom
electricsheephq:codex/upstream-source-aware-sync
Closed

fix: preserve source id during sync imports#639
100yenadmin wants to merge 2 commits intogarrytan:masterfrom
electricsheephq:codex/upstream-source-aware-sync

Conversation

@100yenadmin
Copy link
Copy Markdown

@100yenadmin 100yenadmin commented May 5, 2026

Fixes #540.
Supersedes #541, which was based on the old garrytan/embedding-providers branch and now conflicts with current master.

Plain-English Summary

GBrain now supports multiple named sources. That only works if every page, chunk, tag, and version is written back to the same source it came from.

Before this fix, gbrain sync --source docs could correctly remember that it was syncing docs, but the import/write path could still reconcile content by slug alone. If two sources both had README.md, the named source could accidentally touch default. That is data bleed, and it makes multi-source brains unsafe for real teams.

This PR makes source identity part of the write contract.

flowchart LR
  A["source: docs"] --> B["sync/import"]
  C["source: support-kb"] --> B
  B --> D["page key: source_id + slug"]
  D --> E["pages"]
  D --> F["chunks"]
  D --> G["tags"]
  D --> H["versions"]
  E --> I["search stays scoped to the right source"]
Loading

Maintainer Notes

  • Threads optional sourceId from full sync and incremental sync into runImport() / importFile().
  • Threads optional sourceId through importFromContent() and code-file import.
  • Scopes page writes, deletes, chunk writes, tag reconciliation, and page-version creation by (source_id, slug) in both PGLite and Postgres engines.
  • Preserves back-compat: callers that omit sourceId still target source_id='default'.
  • Keeps empty-string source ids explicit by checking sourceId !== undefined, matching the existing regression expectations.
  • Extends version APIs so getVersions() and revertToVersion() can target the same source-aware identity as writes.

Human Walkthrough

If your company has two knowledge folders with the same file names, GBrain should not guess which one you meant. This change makes the folder/source name travel with the document all the way into storage, history, tags, and search chunks.

Review-Fix Pass

Latest review-fix commit: 415ee45.

Fixed Copilot review findings around:

  • empty-string sourceId handling;
  • source-scoped version history and revert;
  • over-broad GetPageOpts docs;
  • source-scoped import sync anchors.

Validation

  • git diff --check
  • bun run verify
  • bun test test/sync.test.ts test/import-file.test.ts test/multi-source-integration.test.ts
  • focused review-fix rerun: bun test test/sync.test.ts test/pglite-engine.test.ts -> 128 pass, 0 fail
  • prior full isolated run: HOME=$(mktemp -d) bun run test -> 3831 pass, 0 fail

Notes

The first full local test run without HOME isolation hit this machine's real ~/.gbrain, causing an unrelated BrainRegistry test to resolve the host brain instead of failing as the test expects. Re-running with a clean HOME matches CI/test isolation and passes.


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Thread optional sourceId through sync/import/import-file and scope page, chunk, tag, and version writes by source identity instead of bare slug.

Default callers still target source_id='default'. Adds regression coverage for named-source full sync and import.

Validation: bun run verify; HOME=/var/folders/l7/tyvn8qfs50v57w__75s3k0f00000gp/T/tmp.UhjtVbjvxm bun run test (3831 pass, 0 fail); focused sync/import/multi-source tests.
Copilot AI review requested due to automatic review settings May 5, 2026 10:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes multi-source sync/import behavior by threading an optional sourceId through the import/write pipeline and scoping page/chunk/tag/version writes and lookups by (source_id, slug) so named-source syncs don’t accidentally write into default.

Changes:

  • Thread sourceId through performSyncrunImport() / importFile() and through importFromContent() / importCodeFile().
  • Update both Postgres and PGLite engines to scope page CRUD, chunk CRUD, tag reconciliation, and version creation by source_id.
  • Add sync/import regression tests to ensure named-source runs write pages (and associated chunks/tags/versions) into the requested source.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/sync.test.ts Adds regression tests asserting named-source sync/import writes into the requested source, not default.
src/core/types.ts Extends core types with PageInput.source_id and updates GetPageOpts documentation.
src/core/engine.ts Extends the engine interface to accept optional sourceId on several write paths (notably createVersion).
src/core/import-file.ts Threads sourceId through import logic for page/chunk/tag/version operations.
src/commands/sync.ts Passes sourceId through sync deletes and import calls so sync is source-scoped.
src/commands/import.ts Adds sourceId support to runImport() and forwards it into importFile().
src/core/postgres-engine.ts Scopes page/chunk/tag/version operations by source_id in the Postgres engine.
src/core/pglite-engine.ts Scopes page/chunk/tag/version operations by source_id in the PGLite engine.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/commands/sync.ts Outdated
Comment thread src/commands/sync.ts Outdated
Comment thread src/core/import-file.ts Outdated
Comment thread src/core/import-file.ts Outdated
Comment thread src/core/engine.ts Outdated
Comment thread src/core/types.ts Outdated
@100yenadmin
Copy link
Copy Markdown
Author

@copilot please re-review this PR with the latest changes. @codesmith please re-check this PR against the latest commits and refreshed PR description.

@blacksmith-sh
Copy link
Copy Markdown

blacksmith-sh Bot commented May 5, 2026

Hi @100yenadmin! Codesmith requires write access to this repository. You currently have read-only access to garrytan/gbrain.

milehighfry405 added a commit to milehighfry405/gbrain that referenced this pull request May 7, 2026
Removed PR garrytan#438 write-through patch. Using upstream put_page (DB-only)
+ dream cycle for link/timeline materialization per Garry's intended
architecture. The write-through created double-write conflicts with
dream's reverseWriteSlugs().

Fork now carries only PR garrytan#639 (source_id routing for sync/import).
Updated PATCHES.md and validate-patches.sh accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan
Copy link
Copy Markdown
Owner

garrytan commented May 9, 2026

Thanks for the contribution! This is being closed because it was superseded by #757 in #776 (full superset including the performFullSync gap). Real fix shipped; your name will appear in the v0.30.3 release notes attribution where applicable. Thank you again.

@garrytan garrytan closed this May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: named-source sync/import writes pages to default source

3 participants