Skip to content

feat: sync should support subdir-of-git-repo as a source (atlas-style monorepo with N logical sources) #753

@jeremyknows

Description

@jeremyknows

Summary

For monorepo-style brains where a single git repo contains multiple logical sources (each in its own subdirectory), gbrain sync currently can't operate per-source — sync requires local_path to be a git root. This forces an architectural choice: either restructure the repo into N separate repos (operationally painful) OR fall back to gbrain import which is deliberately default-only (loses per-source isolation).

Concrete repro / use case

I'm running gbrain on an "atlas" knowledge brain that has 1 git repo with 9 agent-specific subdirectories, each holding that agent's diary entries. Each agent should be its own source (per-source isolation for source-aware ranking). Plus a separate repo (external-kb) for VeeFriends KB content.

~/brain/                            # ← one git repo
├── agents/
│   ├── watson/memory/              # ← logical source: watson-diary
│   ├── librarian/memory/           # ← logical source: librarian-diary
│   ├── terminal/memory/            # ← logical source: terminal-diary
│   ├── augur/memory/               # ← logical source: augur-diary
│   └── ... (5 more)
└── shared/wiki/                    # ← logical source: wiki

~/projects/external-kb/           # ← separate git repo (works fine with sync today)
gbrain sources add watson-diary --path ~/brain/agents/watson/memory --no-federated
gbrain sync --source watson-diary --full
# Error: Not a git repository: ~/brain/agents/watson/memory.
#        GBrain sync requires a git-initialized repo.

commands/sync.ts:374 enforces:

if (!existsSync(join(repoPath, '.git'))) {
  throw new Error(`Not a git repository: ${repoPath}. GBrain sync requires a git-initialized repo.`);
}

For this case, the git repo is at ~/brain (the parent), the source is at ~/brain/agents/watson/memory (a subdir).

Proposed fix

Add --src-subpath <path> flag (or rename --repo to --root, with --source-path semantics) so sync can operate on a subdirectory while reading git history from the repo root. Concretely:

gbrain sync --source watson-diary --root ~/atlas --src-subpath agents/watson/memory --full

Or, more elegant: when a source's local_path is registered, sync walks UP from local_path to find the nearest .git directory (longest-prefix git-root discovery), then uses THAT as the git context but only sync's files under local_path. No new flag needed; just smarter resolution.

isSyncable() filtering already runs per-file via walkSyncableFiles(), so the sync-scope-vs-git-scope split is a small lift.

Workaround today

For my use case I fell back to gbrain import <subdir> for the 10 atlas-internal sources (which writes to default since import is deliberately default-only) + gbrain sync --source external-kb --full for the 1 source that IS its own git repo. Loses per-source isolation for 91% of my content.

After PR #707 lands (which I tested locally + adopted via a rebase — see comment on that PR), only the sync-vs-import surface still has this gap.

Why now

The combination of:

makes this the natural next gap to close. The 1-repo-per-source assumption seems to be the only remaining structural barrier to wider adoption for monorepo-style brains.

What I'm NOT requesting

Happy to take a stab at this if it sounds right to you. Wanted to surface the use case first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions