Skip to content

sync --source X attributes pages to source_id='default' instead of X #978

@kyledeanjackson

Description

@kyledeanjackson

Summary

When using gbrain sync --repo <path> --source <id> to ingest content from a named source, the resulting pages table rows are written with source_id = 'default' rather than the source ID passed via --source. The sync command does correctly write the source-scoped tracking state (sources.last_commit, sources.last_sync_at), but the page rows themselves end up unattributed.

Net effect: gbrain sources list reports 0 pages for every named source, even though the content has been ingested successfully and is fully searchable via the default bucket.

Repro

Tested on a fresh multi-source setup with three federated sources (bh-brain, bh-vault, bh-studio-general):

# Register + federate a new source
gbrain sources add my-source --path /path/to/my-source
gbrain sources federate my-source

# Sync it
gbrain sync --repo /path/to/my-source --source my-source
# → imports N pages, last_sync_at updates, last_commit advances

# List sources
gbrain sources list
# → my-source shows "0 pages, last sync <timestamp>"
# → default shows aggregate count of everything

# Confirm content is actually in the DB
psql -c "SELECT source_id, COUNT(*) FROM pages GROUP BY source_id;"
# →  default | <N+existing>
# → no row for my-source

# Search works (so content is real, just mis-attributed)
gbrain search "<some unique phrase from my-source content>"
# → returns the new pages, ranked correctly

Expected vs. actual

Expected Actual
pages.source_id for content ingested via sync --source X X default
gbrain sources list page_count for named source accurate always 0
Search across federated sources works works (unaffected)
Source-scoped delete / re-sync works doesn't (everything's in default)

Diagnosis

The migration at src/core/migrate.ts:519 adds pages.source_id TEXT DEFAULT 'default' (correct).

performSync in src/commands/sync.ts correctly reads/writes per-source state via readSyncAnchor / writeSyncAnchor when opts.sourceId is set.

But the actual page-insert path runs through importFile (in src/core/import-file.ts) — and that function doesn't appear to receive or propagate the sourceId from the sync caller. Pages are inserted without an explicit source_id, so the column DEFAULT ('default') kicks in.

The fix is likely:

  1. Thread sourceId from performSyncimportFile → page INSERT statements
  2. Have a one-time migration / gbrain sources reconcile <id> command to retro-attribute pages that have already landed under default for a given local_path

Environment

Impact

Mild — content lands and is queryable, so the core functionality works. The visible symptom is the misleading 0 pages count on every named source row, which makes it hard to:

  • Verify a new source ingested what you expected
  • Use source-scoped operations (delete-all-for-source, etc.)
  • Filter searches to a specific source's content

Workaround

For now: query the pages table directly with a LIKE filter on slug to confirm a source's content is present, and rely on semantic search returning the right hits regardless of attribution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions