You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a flag to gbrain sync --strategy code that filters the file walker through .gitignore (and any other rules git check-ignore would honor — .git/info/exclude, global excludes). The flag can default off so existing behavior is preserved; users with gitignored build/output directories can opt in.
Today collectSyncableFiles in src/commands/import.ts only excludes hidden dirs + node_modules + ops. Repos that put generated artifacts under a gitignored path (out/, dist/, build/, __pycache__/, coverage/, data/, etc.) end up admitting every file in there as "code", because CODE_EXTENSIONS in src/core/sync.ts covers .json, .yaml, .toml, .html, .css along with the actual code extensions.
For repos with archived run outputs (ML training, backtests, build artifacts, log captures) this can mean the walker admits tens of thousands of files the operator never intended to index. The DB bloats; embedding cost goes up; search results get polluted by stale fixture data; and if one of those files happens to be pathological (e.g. a single-line giant JSON), it can wedge the chunker.
Mechanism: use git ls-files --cached --others --exclude-standard when the source has a .git/ (the same shape sync.ts:248 already uses for manifest building), and fall back to the existing walker for non-git source paths.
Why not just always default-on
Some users intentionally want gitignored content indexed (e.g. dotfile brains, sensitive notes that live under .gitignore to keep them out of git but should still be brain-searchable). Keeping it opt-in respects both modes. A future major release could flip the default once users have had time to migrate.
Related
Feature: path exclusion for gbrain sync (.gbrainignore or sync.exclude) #449 asks for .gbrainignore / sync.exclude — a different mechanism for the same underlying need ("let me opt files out of sync"). The two can coexist: .gitignore integration is the easy win for repos that already express the right intent in their existing gitignore, and the .gbrainignore / sync.exclude route covers cases where the two intents diverge.
Environment
gbrain v0.35.0.0 (commit baf1a47)
Filing on behalf of a user whose --strategy code sync stalled on tens of thousands of files it shouldn't have touched.
Request
Add a flag to
gbrain sync --strategy codethat filters the file walker through.gitignore(and any other rulesgit check-ignorewould honor —.git/info/exclude, global excludes). The flag can default off so existing behavior is preserved; users with gitignored build/output directories can opt in.Today
collectSyncableFilesinsrc/commands/import.tsonly excludes hidden dirs +node_modules+ops. Repos that put generated artifacts under a gitignored path (out/,dist/,build/,__pycache__/,coverage/,data/, etc.) end up admitting every file in there as "code", becauseCODE_EXTENSIONSinsrc/core/sync.tscovers.json,.yaml,.toml,.html,.cssalong with the actual code extensions.For repos with archived run outputs (ML training, backtests, build artifacts, log captures) this can mean the walker admits tens of thousands of files the operator never intended to index. The DB bloats; embedding cost goes up; search results get polluted by stale fixture data; and if one of those files happens to be pathological (e.g. a single-line giant JSON), it can wedge the chunker.
Suggested shape
Or as a config knob:
Mechanism: use
git ls-files --cached --others --exclude-standardwhen the source has a.git/(the same shapesync.ts:248already uses for manifest building), and fall back to the existing walker for non-git source paths.Why not just always default-on
Some users intentionally want gitignored content indexed (e.g. dotfile brains, sensitive notes that live under
.gitignoreto keep them out of git but should still be brain-searchable). Keeping it opt-in respects both modes. A future major release could flip the default once users have had time to migrate.Related
gbrain sync(.gbrainignore or sync.exclude) #449 asks for.gbrainignore/sync.exclude— a different mechanism for the same underlying need ("let me opt files out of sync"). The two can coexist:.gitignoreintegration is the easy win for repos that already express the right intent in their existing gitignore, and the.gbrainignore/sync.excluderoute covers cases where the two intents diverge.Environment
baf1a47)--strategy codesync stalled on tens of thousands of files it shouldn't have touched.