Skip to content

fix(frontmatter walkers): skip .git, node_modules, .obsidian in collectFiles + scanBrainSources walkDir (closes #799)#922

Closed
jeremyknows wants to merge 2 commits into
garrytan:masterfrom
jeremyknows:fix/frontmatter-validate-skip-node-modules
Closed

fix(frontmatter walkers): skip .git, node_modules, .obsidian in collectFiles + scanBrainSources walkDir (closes #799)#922
jeremyknows wants to merge 2 commits into
garrytan:masterfrom
jeremyknows:fix/frontmatter-validate-skip-node-modules

Conversation

@jeremyknows

@jeremyknows jeremyknows commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Two walkers used by the frontmatter surface — collectFiles (used by gbrain frontmatter validate) and walkDir (used by scanBrainSources → doctor's frontmatter_integrity check) — were recursing into every subdirectory without filtering. Two other walkers in the same codebase already skip the same three directories — these two were the outliers:

  • walkDir() at src/commands/frontmatter.ts:421 (used by gbrain frontmatter generate): if (entry === '.git' || entry === 'node_modules' || entry === '.obsidian') continue;
  • src/core/disk-walk.ts:56 (canonical brain walker): if (entry.name.startsWith('.') || entry.name === 'node_modules') continue;

Result: gbrain frontmatter validate AND gbrain doctor on any project with installed dependencies flood the report with MISSING_OPEN false-positives on third-party HISTORY.md, CHANGELOG.md, and readme.md files inside node_modules/.

Closes #799 (filed by @metaWuming, 2026-05-10) — reporter saw 869 doctor warnings on three federated code sources, all entirely from node_modules/.

Diff scope

src/commands/frontmatter.ts | 5 +++++
src/core/brain-writer.ts    | 8 ++++++++
2 files changed, 13 insertions(+)

Two commits, both apply the same skip-list pattern at the existing per-entry filter point. No API change. Match the exact pattern at frontmatter.ts:421 and disk-walk.ts:56 for consistency.

Real-behavior proof

Environment: macOS 14, bun 1.3.8, gbrain on master (post-rebase) → branched as fix/frontmatter-validate-skip-node-modules. Test source: ~/projects/veefriends-kb (Next.js project, real node_modules/ with 71 third-party MD files).

Before (origin/master)

gbrain frontmatter validate (collectFiles walker):

$ gbrain frontmatter validate /Users/watson/projects/veefriends-kb
Found 71 issue(s) across 71 file(s) (scanned 126)

/Users/watson/projects/veefriends-kb/node_modules/router/HISTORY.md
  [MISSING_OPEN]:1 Frontmatter must start with --- on the first non-empty line
/Users/watson/projects/veefriends-kb/node_modules/es-object-atoms/CHANGELOG.md
  [MISSING_OPEN]:1 Frontmatter must start with --- on the first non-empty line
... (all 71 are node_modules artifacts)

gbrain doctor (scanBrainSources walkDir):

2170 frontmatter issue(s) across 11 source(s)

(Per #799 reporter on a different project: 869 doctor warnings, all node_modules MISSING_OPEN.)

After (this PR)

gbrain frontmatter validate:

$ gbrain frontmatter validate /Users/watson/projects/veefriends-kb
OK — 55 file(s) scanned, no frontmatter issues

55 files scanned instead of 126 — node_modules/ correctly skipped.

gbrain doctor:

2099 frontmatter issue(s) across 10 source(s)

Drop of 71 (= the vfkb node_modules false-positives now correctly excluded). The remaining 2099 are real MISSING_OPEN cases in registered diary paths, not node_modules.

Regression check — non-node_modules source still reports real issues

$ gbrain frontmatter validate ~/atlas/agents/terminal/memory
Found 7 issue(s) across 7 file(s) (scanned 135)

Same count before and after on a source without node_modules/.

Verification

$ bun run typecheck
$ tsc --noEmit
(exit 0)

$ bun run verify
OK: no JSON.stringify(x)::jsonb interpolation pattern in src/
OK: max_stalled defaults are 5 in all schema sources
OK: all rowToPage feeder projections include source_id
check-progress-to-stdout: OK (no banned stdout \r patterns)
check-test-isolation: OK (367 non-serial unit files scanned)
[check-wasm-embedded] OK — compiled binary produced real semantic chunks.
[check-admin-scope-drift] ok: 5 scopes match
OK: src/cli.ts is git-tracked as executable (100755)
OK: no direct derived-table writes outside the reconcile layer in src/ + scripts/
(all 11 checks pass)

What was NOT tested

  • .obsidian exclusion wasn't exercised against a real Obsidian vault in this session. Followed the existing pattern from walkDir():421 which includes .obsidian — behavior is identical to that walker.
  • No unit test added — the existing test/frontmatter-*.test.ts files don't have a fixture for "walker recurses into excluded directories." Adding one would require a fixture project with a node_modules/ shim. Happy to add one if you'd prefer.

Test plan

  • CI green
  • Reviewer runs gbrain frontmatter validate <any-node-project> pre and post fix; node_modules entries should disappear post-fix
  • Reviewer runs gbrain doctor on the same project; frontmatter_integrity count should drop by the number of node_modules .md files
  • gbrain frontmatter validate <non-node-project> reports the same count before and after (regression check)

Commits

  1. 75151d9 fix(frontmatter validate): skip .git, node_modules, .obsidian in collectFiles
  2. 5cd7c12 fix(doctor): scanBrainSources walkDir also skips .git / node_modules / .obsidian

Closes #799.

…ectFiles

`collectFiles` in src/commands/frontmatter.ts (the walker used by
`gbrain frontmatter validate`) recurses into every subdirectory without
filtering. Two other walkers in the same codebase already skip these
directories:

- `walkDir` at frontmatter.ts:421 (the `generate` subcommand): skips
  `.git`, `node_modules`, `.obsidian`
- `disk-walk.ts:56` (canonical brain walker): skips dot-directories
  and `node_modules`

`collectFiles` was the outlier. Result: `gbrain frontmatter validate
<path>` on any project with installed dependencies floods the report
with dozens of MISSING_OPEN false-positives on third-party
HISTORY.md / CHANGELOG.md / readme.md files in `node_modules/`.

Observed on a 59-file vfkb source: 126 reported issues, 71 of which
were in `node_modules/` after fixing the 55 real content files. With
this fix, the same source reports 0 issues — matching the `generate`
subcommand's pre-existing behavior.

The fix is one block of three skip-cases, applied at the same place
in the loop as the existing symlink-skip. No public API change.
…/ .obsidian

Companion to the first commit's `collectFiles` fix. Doctor's
`frontmatter_integrity` check (src/commands/doctor.ts:1561) calls
`scanBrainSources` (src/core/brain-writer.ts:249) → `scanOneSource` →
`walkDir(src/core/brain-writer.ts:350)`. The walker had the same gap as
`collectFiles`: no skip-list, recurses into `node_modules/`, flags every
third-party dependency's HISTORY.md / CHANGELOG.md / readme.md as
MISSING_OPEN.

This is the walker filed in garrytan#799 — issue body shows 869 doctor warnings
on three federated code sources, all entirely from `node_modules/` files
that `sync` deliberately skipped.

Fix matches the existing skip-list pattern from the other walkers
(`src/commands/frontmatter.ts:421` and `src/core/disk-walk.ts:56`).

Together with the first commit (`collectFiles`), both walkers used by
the frontmatter surface now agree on what counts as source content.
@jeremyknows jeremyknows changed the title fix(frontmatter validate): skip .git, node_modules, .obsidian in collectFiles fix(frontmatter walkers): skip .git, node_modules, .obsidian in collectFiles + scanBrainSources walkDir (closes #799) May 12, 2026
@garrytan

Copy link
Copy Markdown
Owner

Thanks @jeremyknows — already shipped in master as of v0.35.5.0. The new pruneDir(name) helper in src/core/sync.ts is the single source of truth for descent-time directory exclusion across walkers: blocks node_modules (no leading dot, the case your PR catches), dot-prefix dirs, ops/, and *.raw sidecars. Both walkMarkdownFiles in src/commands/extract.ts and listTextFiles in src/core/cycle/transcript-discovery.ts consult it BEFORE recursing. Closes #799 + #923 + #202.

If your install is still seeing MISSING_OPEN warnings from node_modules after upgrade, please reopen with gbrain doctor --json.

@garrytan garrytan closed this May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

doctor frontmatter walker doesn't exclude node_modules — false 800+ MISSING_OPEN warnings

2 participants