Summary
When using gbrain sync --repo <path> --source <id> to ingest content from a named source, the resulting pages table rows are written with source_id = 'default' rather than the source ID passed via --source. The sync command does correctly write the source-scoped tracking state (sources.last_commit, sources.last_sync_at), but the page rows themselves end up unattributed.
Net effect: gbrain sources list reports 0 pages for every named source, even though the content has been ingested successfully and is fully searchable via the default bucket.
Repro
Tested on a fresh multi-source setup with three federated sources (bh-brain, bh-vault, bh-studio-general):
# Register + federate a new source
gbrain sources add my-source --path /path/to/my-source
gbrain sources federate my-source
# Sync it
gbrain sync --repo /path/to/my-source --source my-source
# → imports N pages, last_sync_at updates, last_commit advances
# List sources
gbrain sources list
# → my-source shows "0 pages, last sync <timestamp>"
# → default shows aggregate count of everything
# Confirm content is actually in the DB
psql -c "SELECT source_id, COUNT(*) FROM pages GROUP BY source_id;"
# → default | <N+existing>
# → no row for my-source
# Search works (so content is real, just mis-attributed)
gbrain search "<some unique phrase from my-source content>"
# → returns the new pages, ranked correctly
Expected vs. actual
|
Expected |
Actual |
pages.source_id for content ingested via sync --source X |
X |
default |
gbrain sources list page_count for named source |
accurate |
always 0 |
| Search across federated sources |
works |
works (unaffected) |
| Source-scoped delete / re-sync |
works |
doesn't (everything's in default) |
Diagnosis
The migration at src/core/migrate.ts:519 adds pages.source_id TEXT DEFAULT 'default' (correct).
performSync in src/commands/sync.ts correctly reads/writes per-source state via readSyncAnchor / writeSyncAnchor when opts.sourceId is set.
But the actual page-insert path runs through importFile (in src/core/import-file.ts) — and that function doesn't appear to receive or propagate the sourceId from the sync caller. Pages are inserted without an explicit source_id, so the column DEFAULT ('default') kicks in.
The fix is likely:
- Thread
sourceId from performSync → importFile → page INSERT statements
- Have a one-time migration /
gbrain sources reconcile <id> command to retro-attribute pages that have already landed under default for a given local_path
Environment
Impact
Mild — content lands and is queryable, so the core functionality works. The visible symptom is the misleading 0 pages count on every named source row, which makes it hard to:
- Verify a new source ingested what you expected
- Use source-scoped operations (delete-all-for-source, etc.)
- Filter searches to a specific source's content
Workaround
For now: query the pages table directly with a LIKE filter on slug to confirm a source's content is present, and rely on semantic search returning the right hits regardless of attribution.
Summary
When using
gbrain sync --repo <path> --source <id>to ingest content from a named source, the resultingpagestable rows are written withsource_id = 'default'rather than the source ID passed via--source. The sync command does correctly write the source-scoped tracking state (sources.last_commit,sources.last_sync_at), but the page rows themselves end up unattributed.Net effect:
gbrain sources listreports0 pagesfor every named source, even though the content has been ingested successfully and is fully searchable via thedefaultbucket.Repro
Tested on a fresh multi-source setup with three federated sources (
bh-brain,bh-vault,bh-studio-general):Expected vs. actual
pages.source_idfor content ingested viasync --source XXdefaultgbrain sources listpage_count for named sourcedefault)Diagnosis
The migration at
src/core/migrate.ts:519addspages.source_id TEXT DEFAULT 'default'(correct).performSyncinsrc/commands/sync.tscorrectly reads/writes per-source state viareadSyncAnchor/writeSyncAnchorwhenopts.sourceIdis set.But the actual page-insert path runs through
importFile(insrc/core/import-file.ts) — and that function doesn't appear to receive or propagate thesourceIdfrom the sync caller. Pages are inserted without an explicitsource_id, so the column DEFAULT ('default') kicks in.The fix is likely:
sourceIdfromperformSync→importFile→ page INSERT statementsgbrain sources reconcile <id>command to retro-attribute pages that have already landed underdefaultfor a given local_pathEnvironment
kyledeanjackson/gbrain(fork ofgarrytan/gbrain) — fix branch from PR fix:embed --stalepulls all chunks every cycle (3 TB egress regression) #775v0.18.0 Step 5source-scoped sync state ✓,v0.18.0 Step 7files.source_id ✓)Impact
Mild — content lands and is queryable, so the core functionality works. The visible symptom is the misleading
0 pagescount on every named source row, which makes it hard to:Workaround
For now: query the
pagestable directly with aLIKEfilter onslugto confirm a source's content is present, and rely on semantic search returning the right hits regardless of attribution.