Skip to content

v0.31.2 fix: gbrain sync --strategy code no longer hangs on big symlink-rich repos#773

Merged
garrytan merged 5 commits into
masterfrom
garrytan/puebla-v2
May 10, 2026
Merged

v0.31.2 fix: gbrain sync --strategy code no longer hangs on big symlink-rich repos#773
garrytan merged 5 commits into
masterfrom
garrytan/puebla-v2

Conversation

@garrytan

@garrytan garrytan commented May 9, 2026

Copy link
Copy Markdown
Owner

Summary

Closes the bug class where gbrain sync --strategy code against a 1500-file gstack repo could pin one thread at 99% CPU for hours with zero disk writes and a page_count that stayed at 0. Three real defects, fixed together in one bisectable commit, plus version bump and CHANGELOG.

Bug fix (commit 1):

  • Tree-sitter chunker had no wall-clock cap. A single pathological file could wedge the whole sync inside WASM. New parseWithTimeout helper in src/core/chunkers/code.ts wraps parser.parse() with setTimeoutMicros(timeoutMs * 1000), throws ChunkerTimeoutError on null, and the caller's try/finally reaps parser+tree (closes the leak Codex flagged where the catch block returned without delete()). Default 30s, override via GBRAIN_CHUNKER_TIMEOUT_MS. Falls back to fallbackChunks (recursive text chunker) on timeout — degrades search quality on that one file, doesn't wedge sync.
  • Code-strategy first-sync silently no-op'd on code files. performFullSync called runImport(repoPath) with no strategy; runImport only ever walked .md/.mdx. Now opts.strategy threads end-to-end (full-sync write path AND dry-run). Code files actually reach the dispatcher, which already routes them to importCodeFile.
  • Walker was thrice-redundant. collectMarkdownFiles (lstat-safe) and walkSyncableFiles (statSync, weaker for no good reason) collapsed into one hardened collectSyncableFiles in src/commands/import.ts: lstat + symlink-skip with canonical log line; inode-cycle Map keyed on ${st_dev}:${st_ino} (defense-in-depth for non-symlink loops); MAX_WALK_DEPTH=32 structural backstop with GBRAIN_MAX_WALK_DEPTH override; .sort() output (Codex C8: runImport's checkpoint resume is index-based against a sorted list). Walker-context multimodal carve-out preserved at one site (Codex C5).
  • Per-phase diagnostic logging. Structured [gbrain phase] <name> start/done <ms> stderr lines on sync.git_pull, sync.fullsync.import, import.collect_files, and per-file slow path (>5s). The next hang reproduction names the phase that wedged.

Version + docs (commit 2): VERSION 0.30.1 → 0.31.2, package.json synced, CHANGELOG entry under [0.31.2] (preserved master's [0.30.2] entry below), TODOS.md filed gbrain query <common-keyword> 7-day-zombie investigation (PIDs 39429, 46624) and the deferred amarillo-shape PGLite + Postgres E2E as v0.31.3 follow-ups.

Test Coverage

CODE PATHS                                                    USER FLOWS
[+] src/core/chunkers/code.ts
  ├── parseWithTimeout()                                      [+] gbrain sync --strategy code on amarillo-shape repo
  │   ├── [★★★ TESTED] timeout fires → fallback (stub seam)    └── [DEFERRED to v0.31.3]
  │   ├── [★★ TESTED]  default-timeout on real code                       (PGLite + Postgres E2E for the user's
  │   ├── [★★ TESTED]  env override works                                  exact 1500-file self-symlink topology)
  │   ├── [★★★ TESTED] cleanup-on-throw smoke
  │   ├── [★★★ TESTED] setTimeoutMicros API absent → loud
  │   └── [★★★ TESTED] ChunkerTimeoutError shape (filePath/ms)
[+] src/commands/import.ts
  ├── collectSyncableFiles()
  │   ├── [★★★ TESTED] symlink-skip + canonical log
  │   ├── [★★★ TESTED] inode-cycle detection
  │   ├── [★★★ TESTED] max-depth bailout (40-level tree)
  │   ├── [★★★ TESTED] strategy filter (code/markdown/auto)
  │   ├── [★★★ TESTED] dot-dir + node_modules skip
  │   ├── [★★★ TESTED] multimodal preservation under markdown
  │   └── [★★★ TESTED] deterministic ordering (codex C8)
  └── runImport()
      └── opts.strategy threaded (verified via existing import-resume + extract tests)

COVERAGE: 14/14 new code paths tested (100%)  |  GAPS: 0
QUALITY: ★★★:11 ★★:3 ★:0

bun test (unit suite, 109 targeted cases across changed surface): 109 pass / 0 fail. Typecheck clean.

Tests: existing test count + 14 new (test/sync-walker-symlink.test.ts × 7, test/chunker-timeout.test.ts × 7).

Pre-Landing Review

Codex outside-voice review caught 5 bug-grade findings during plan stage (parser cleanup leak, multimodal contradiction, missed dry-run plumbing, deterministic-order requirement, machine-speed-flaky tests). All five folded into the shipped fix without re-litigating the bundle decision. See ~/.claude/plans/system-instruction-you-are-working-jolly-stallman.md for the full decision history (D1-D5 + Codex C1-C11).

No new findings on the diff itself.

Plan Completion

Item Status
Probe web-tree-sitter for setTimeoutMicros (D2=A) DONE — 0.22.6 confirmed has API
Chunker timeout (prong 1) DONE — parseWithTimeout + try/finally cleanup
Walker hardening (prong 3) DONE — lstat + inode + max-depth + sort
Strategy plumbing (prong 2) DONE — performFullSync + dry-run paths
Per-phase logging (D5=A) DONE — minimal inline log lines
Unit tests DONE — 14 cases passing
PGLite + Postgres amarillo-shape E2E (D4=C) DEFERRED to v0.31.3 (filed in TODOS.md)
VERSION + CHANGELOG + TODOS DONE

7/8 plan items DONE. Deferred E2E item is filed as P2 follow-up; unit tests already prove walker hardening and chunker timeout behaviors.

TODOS

Filed two v0.31.2 follow-ups:

  • P1: Investigate gbrain query <common-keyword> infinite loop (separate bug class, 7-day zombies at 6+GB RES)
  • P2: PGLite + Postgres E2E for amarillo-shape topology (deferred from this PR)

Test plan

  • bun run typecheck clean
  • bun test test/sync-walker-symlink.test.ts test/chunker-timeout.test.ts — 14 pass
  • Targeted suite (sync, extract, import-file, import-resume, sync-strategy) — 109 pass / 0 fail
  • Manual repro on ~/conductor/workspaces/gstack/amarillo-v2 against the user's actual source — user to verify post-merge (gbrain upgrade + gbrain sync --strategy code --source <id> should complete in ~25-35 min and produce non-zero page_count)

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

garrytan and others added 5 commits May 8, 2026 18:41
`gbrain sync --strategy code` against a 1500-file repo could pin one
thread at 99% CPU for hours with zero disk writes and a `page_count`
that stayed at 0. Three real defects, all closed in one commit:

1. **Tree-sitter chunker had no wall-clock cap.** A single pathological
   file could wedge the whole sync inside WASM. New `parseWithTimeout`
   helper in src/core/chunkers/code.ts wraps `parser.parse()` with
   `setTimeoutMicros(timeoutMs * 1000)`, throws `ChunkerTimeoutError`
   on null, and the caller's try/finally reaps parser+tree (closes the
   leak codex flagged where the catch block returned without delete()).
   Default 30s, override via `GBRAIN_CHUNKER_TIMEOUT_MS`. Falls back to
   recursive chunks on timeout — degrades search quality on that one
   file, doesn't wedge sync.

2. **Code-strategy first-sync silently no-op'd on code files.**
   `performFullSync` called `runImport(repoPath)` with no strategy;
   `runImport` only ever walked `.md`/`.mdx`. Now `opts.strategy`
   threads end-to-end (full-sync write path AND dry-run). Code files
   actually reach the dispatcher, which already routes them to
   `importCodeFile` correctly.

3. **Walker was thrice-redundant.** `collectMarkdownFiles` (lstat-safe,
   import path) and `walkSyncableFiles` (statSync, cost-preview path,
   weaker for no good reason) collapsed into one hardened
   `collectSyncableFiles` in src/commands/import.ts: lstat + symlink-
   skip with canonical log line; inode-cycle Map keyed on
   `${st_dev}:${st_ino}` (defense-in-depth for non-symlink loops);
   `MAX_WALK_DEPTH=32` structural backstop with `GBRAIN_MAX_WALK_DEPTH`
   override; `.sort()` output (codex C8: `runImport`'s checkpoint
   resume is index-based against a sorted list). Walker-context
   multimodal carve-out preserved at one site (codex C5).

Plus structured `[gbrain phase] <name> start/done` stderr lines on
git_pull, fullsync.import, collect_files, and per-file slow path
(>5s). When the next hang lands, log says which phase wedged.

Tests:
- `test/sync-walker-symlink.test.ts` — 7 cases (self-symlink loop,
  symlink-chain inode cycle, max-depth bailout, strategy filter,
  dot-dir skip, multimodal preservation, deterministic ordering)
- `test/chunker-timeout.test.ts` — 7 cases (parser-stub seam,
  ChunkerTimeoutError shape, env wiring, fallback behavior, fail-loud
  if setTimeoutMicros API missing, cleanup contract under exception)

Smoke against the user's actual amarillo-v2 repo: 494 code files
walked in 22ms, 2 symlinks skipped with the canonical log line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VERSION 0.31.2, package.json synced. CHANGELOG entry under [0.31.2]
with full release-summary + numbers + upgrader-cost note + To take
advantage block. v0.30.2 entry preserved below from master. TODOS.md
files the gbrain query <common-keyword> 7-day-zombie investigation
(PIDs 39429, 46624) and the deferred amarillo-shape PGLite + Postgres
E2E as v0.31.3 follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's check-test-isolation lint (rule R1) flagged the two new test files
for mutating process.env directly. The repo-wide convention is to wrap
env mutations in withEnv() (test/helpers/with-env.ts), which saves +
restores prior values via try/finally even when the callback throws.
Direct process.env writes leak across files in the same bun test
process (parallel runner loads multiple files into one shard process).

Both files refactored:
- test/sync-walker-symlink.test.ts (GBRAIN_EMBEDDING_MULTIMODAL)
- test/chunker-timeout.test.ts (GBRAIN_CHUNKER_TIMEOUT_MS)

All 14 cases still pass. `bun run verify` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merging v0.31.0 (hot memory facts), v0.31.1 (thin-client mode), and
v0.31.1.1-fixwave (22 community fixes incl. the auth-code P0) into
this branch ahead of v0.31.2's land.

Conflict resolution:
- VERSION + package.json: my v0.31.2 wins (queue slot is mine).
- src/commands/import.ts: keep my opts.strategy plumbing AND master's
  opts.sourceId plumbing — both runImport callers now thread both.
  Per-file slow-log preserved from my branch; sourceId routing through
  importImageFile/importFile preserved from master.
- src/commands/sync.ts: keep my [gbrain phase] log lines AND master's
  detached-head pull skip + sourceId pass-through to runImport.
  performFullSync now passes commit + strategy + sourceId.
- CHANGELOG.md: my [0.31.2] entry on top, then master's [0.31.1.1-fixwave],
  [0.31.1], [0.31.0] entries below in version order.
- TODOS.md: keep both my v0.31.2 follow-ups (gbrain query zombie + amarillo
  shape E2E) and master's thin-client mode follow-ups (Issue #734).

Verified: bun run typecheck clean, bun run verify chain green
(privacy/jsonb/progress/test-isolation/wasm/admin-build/scope-drift/
cli-exec/typecheck), 118 targeted tests pass on the changed surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit eec2d2b into master May 10, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant