v0.31.2 fix: gbrain sync --strategy code no longer hangs on big symlink-rich repos#773
Merged
Conversation
`gbrain sync --strategy code` against a 1500-file repo could pin one
thread at 99% CPU for hours with zero disk writes and a `page_count`
that stayed at 0. Three real defects, all closed in one commit:
1. **Tree-sitter chunker had no wall-clock cap.** A single pathological
file could wedge the whole sync inside WASM. New `parseWithTimeout`
helper in src/core/chunkers/code.ts wraps `parser.parse()` with
`setTimeoutMicros(timeoutMs * 1000)`, throws `ChunkerTimeoutError`
on null, and the caller's try/finally reaps parser+tree (closes the
leak codex flagged where the catch block returned without delete()).
Default 30s, override via `GBRAIN_CHUNKER_TIMEOUT_MS`. Falls back to
recursive chunks on timeout — degrades search quality on that one
file, doesn't wedge sync.
2. **Code-strategy first-sync silently no-op'd on code files.**
`performFullSync` called `runImport(repoPath)` with no strategy;
`runImport` only ever walked `.md`/`.mdx`. Now `opts.strategy`
threads end-to-end (full-sync write path AND dry-run). Code files
actually reach the dispatcher, which already routes them to
`importCodeFile` correctly.
3. **Walker was thrice-redundant.** `collectMarkdownFiles` (lstat-safe,
import path) and `walkSyncableFiles` (statSync, cost-preview path,
weaker for no good reason) collapsed into one hardened
`collectSyncableFiles` in src/commands/import.ts: lstat + symlink-
skip with canonical log line; inode-cycle Map keyed on
`${st_dev}:${st_ino}` (defense-in-depth for non-symlink loops);
`MAX_WALK_DEPTH=32` structural backstop with `GBRAIN_MAX_WALK_DEPTH`
override; `.sort()` output (codex C8: `runImport`'s checkpoint
resume is index-based against a sorted list). Walker-context
multimodal carve-out preserved at one site (codex C5).
Plus structured `[gbrain phase] <name> start/done` stderr lines on
git_pull, fullsync.import, collect_files, and per-file slow path
(>5s). When the next hang lands, log says which phase wedged.
Tests:
- `test/sync-walker-symlink.test.ts` — 7 cases (self-symlink loop,
symlink-chain inode cycle, max-depth bailout, strategy filter,
dot-dir skip, multimodal preservation, deterministic ordering)
- `test/chunker-timeout.test.ts` — 7 cases (parser-stub seam,
ChunkerTimeoutError shape, env wiring, fallback behavior, fail-loud
if setTimeoutMicros API missing, cleanup contract under exception)
Smoke against the user's actual amarillo-v2 repo: 494 code files
walked in 22ms, 2 symlinks skipped with the canonical log line.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VERSION 0.31.2, package.json synced. CHANGELOG entry under [0.31.2] with full release-summary + numbers + upgrader-cost note + To take advantage block. v0.30.2 entry preserved below from master. TODOS.md files the gbrain query <common-keyword> 7-day-zombie investigation (PIDs 39429, 46624) and the deferred amarillo-shape PGLite + Postgres E2E as v0.31.3 follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's check-test-isolation lint (rule R1) flagged the two new test files for mutating process.env directly. The repo-wide convention is to wrap env mutations in withEnv() (test/helpers/with-env.ts), which saves + restores prior values via try/finally even when the callback throws. Direct process.env writes leak across files in the same bun test process (parallel runner loads multiple files into one shard process). Both files refactored: - test/sync-walker-symlink.test.ts (GBRAIN_EMBEDDING_MULTIMODAL) - test/chunker-timeout.test.ts (GBRAIN_CHUNKER_TIMEOUT_MS) All 14 cases still pass. `bun run verify` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merging v0.31.0 (hot memory facts), v0.31.1 (thin-client mode), and v0.31.1.1-fixwave (22 community fixes incl. the auth-code P0) into this branch ahead of v0.31.2's land. Conflict resolution: - VERSION + package.json: my v0.31.2 wins (queue slot is mine). - src/commands/import.ts: keep my opts.strategy plumbing AND master's opts.sourceId plumbing — both runImport callers now thread both. Per-file slow-log preserved from my branch; sourceId routing through importImageFile/importFile preserved from master. - src/commands/sync.ts: keep my [gbrain phase] log lines AND master's detached-head pull skip + sourceId pass-through to runImport. performFullSync now passes commit + strategy + sourceId. - CHANGELOG.md: my [0.31.2] entry on top, then master's [0.31.1.1-fixwave], [0.31.1], [0.31.0] entries below in version order. - TODOS.md: keep both my v0.31.2 follow-ups (gbrain query zombie + amarillo shape E2E) and master's thin-client mode follow-ups (Issue #734). Verified: bun run typecheck clean, bun run verify chain green (privacy/jsonb/progress/test-isolation/wasm/admin-build/scope-drift/ cli-exec/typecheck), 118 targeted tests pass on the changed surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the bug class where
gbrain sync --strategy codeagainst a 1500-file gstack repo could pin one thread at 99% CPU for hours with zero disk writes and apage_countthat stayed at 0. Three real defects, fixed together in one bisectable commit, plus version bump and CHANGELOG.Bug fix (commit 1):
parseWithTimeouthelper insrc/core/chunkers/code.tswrapsparser.parse()withsetTimeoutMicros(timeoutMs * 1000), throwsChunkerTimeoutErroron null, and the caller's try/finally reaps parser+tree (closes the leak Codex flagged where the catch block returned withoutdelete()). Default 30s, override viaGBRAIN_CHUNKER_TIMEOUT_MS. Falls back tofallbackChunks(recursive text chunker) on timeout — degrades search quality on that one file, doesn't wedge sync.performFullSynccalledrunImport(repoPath)with no strategy;runImportonly ever walked.md/.mdx. Nowopts.strategythreads end-to-end (full-sync write path AND dry-run). Code files actually reach the dispatcher, which already routes them toimportCodeFile.collectMarkdownFiles(lstat-safe) andwalkSyncableFiles(statSync, weaker for no good reason) collapsed into one hardenedcollectSyncableFilesinsrc/commands/import.ts: lstat + symlink-skip with canonical log line; inode-cycle Map keyed on${st_dev}:${st_ino}(defense-in-depth for non-symlink loops);MAX_WALK_DEPTH=32structural backstop withGBRAIN_MAX_WALK_DEPTHoverride;.sort()output (Codex C8:runImport's checkpoint resume is index-based against a sorted list). Walker-context multimodal carve-out preserved at one site (Codex C5).[gbrain phase] <name> start/done <ms>stderr lines onsync.git_pull,sync.fullsync.import,import.collect_files, and per-file slow path (>5s). The next hang reproduction names the phase that wedged.Version + docs (commit 2): VERSION 0.30.1 → 0.31.2, package.json synced, CHANGELOG entry under
[0.31.2](preserved master's [0.30.2] entry below), TODOS.md filedgbrain query <common-keyword>7-day-zombie investigation (PIDs 39429, 46624) and the deferred amarillo-shape PGLite + Postgres E2E as v0.31.3 follow-ups.Test Coverage
bun test(unit suite, 109 targeted cases across changed surface): 109 pass / 0 fail. Typecheck clean.Tests: existing test count + 14 new (test/sync-walker-symlink.test.ts × 7, test/chunker-timeout.test.ts × 7).
Pre-Landing Review
Codex outside-voice review caught 5 bug-grade findings during plan stage (parser cleanup leak, multimodal contradiction, missed dry-run plumbing, deterministic-order requirement, machine-speed-flaky tests). All five folded into the shipped fix without re-litigating the bundle decision. See
~/.claude/plans/system-instruction-you-are-working-jolly-stallman.mdfor the full decision history (D1-D5 + Codex C1-C11).No new findings on the diff itself.
Plan Completion
parseWithTimeout+ try/finally cleanup7/8 plan items DONE. Deferred E2E item is filed as P2 follow-up; unit tests already prove walker hardening and chunker timeout behaviors.
TODOS
Filed two v0.31.2 follow-ups:
gbrain query <common-keyword>infinite loop (separate bug class, 7-day zombies at 6+GB RES)Test plan
bun run typecheckcleanbun test test/sync-walker-symlink.test.ts test/chunker-timeout.test.ts— 14 pass~/conductor/workspaces/gstack/amarillo-v2against the user's actual source — user to verify post-merge (gbrain upgrade+gbrain sync --strategy code --source <id>should complete in ~25-35 min and produce non-zeropage_count)🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.