Skip to content

sync --strategy code enhancement: honor .gitignore (opt-in flag) #1073

@BorniteServices

Description

@BorniteServices

Request

Add a flag to gbrain sync --strategy code that filters the file walker through .gitignore (and any other rules git check-ignore would honor — .git/info/exclude, global excludes). The flag can default off so existing behavior is preserved; users with gitignored build/output directories can opt in.

Today collectSyncableFiles in src/commands/import.ts only excludes hidden dirs + node_modules + ops. Repos that put generated artifacts under a gitignored path (out/, dist/, build/, __pycache__/, coverage/, data/, etc.) end up admitting every file in there as "code", because CODE_EXTENSIONS in src/core/sync.ts covers .json, .yaml, .toml, .html, .css along with the actual code extensions.

For repos with archived run outputs (ML training, backtests, build artifacts, log captures) this can mean the walker admits tens of thousands of files the operator never intended to index. The DB bloats; embedding cost goes up; search results get polluted by stale fixture data; and if one of those files happens to be pathological (e.g. a single-line giant JSON), it can wedge the chunker.

Suggested shape

gbrain sync --strategy code --respect-gitignore   # opt in
gbrain sync --strategy code --ignore-from FILE    # arbitrary ignore file

Or as a config knob:

sync:
  respect_gitignore: true

Mechanism: use git ls-files --cached --others --exclude-standard when the source has a .git/ (the same shape sync.ts:248 already uses for manifest building), and fall back to the existing walker for non-git source paths.

Why not just always default-on

Some users intentionally want gitignored content indexed (e.g. dotfile brains, sensitive notes that live under .gitignore to keep them out of git but should still be brain-searchable). Keeping it opt-in respects both modes. A future major release could flip the default once users have had time to migrate.

Related

  • Feature: path exclusion for gbrain sync (.gbrainignore or sync.exclude) #449 asks for .gbrainignore / sync.exclude — a different mechanism for the same underlying need ("let me opt files out of sync"). The two can coexist: .gitignore integration is the easy win for repos that already express the right intent in their existing gitignore, and the .gbrainignore / sync.exclude route covers cases where the two intents diverge.

Environment

  • gbrain v0.35.0.0 (commit baf1a47)
  • Filing on behalf of a user whose --strategy code sync stalled on tens of thousands of files it shouldn't have touched.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions