Skip to content

v0.34.2.0 fix(import): path-based checkpoint resume — kills parallel-drop + failed-file-skip + sort-flip bugs#988

Merged
garrytan merged 7 commits into
masterfrom
garrytan/bangalore-v4
May 15, 2026
Merged

v0.34.2.0 fix(import): path-based checkpoint resume — kills parallel-drop + failed-file-skip + sort-flip bugs#988
garrytan merged 7 commits into
masterfrom
garrytan/bangalore-v4

Conversation

@garrytan

@garrytan garrytan commented May 14, 2026

Copy link
Copy Markdown
Owner

Summary

Path-based import checkpoint replaces the pre-v0.34.2 positional checkpoint that silently lost files in three scenarios. Started as a cherry-pick of @garrytan-agents's PR #964 (sort files newest-first), expanded after /plan-eng-review + codex outside-voice surfaced two pre-existing P1 bugs in the checkpoint model that plan review missed.

Three bug classes closed:

  • Parallel + slow worker: processed++ fires on completion not dispatch, so 4 workers + slow files[0] + 3 fast completions wrote processedIndex=3 and crash-resume sliced files.slice(3), dropping the slow file silently.
  • Failed-file no-retry: errors++; processed++ both fire, so a failure pushed the checkpoint past the bad file. Next run skipped it. The "clear on clean exit" guard at line 268 only triggers when errors === 0, so a single failure preserved the bad checkpoint indefinitely.
  • Sort-flip drop: v0.33.x's newest-first sort applied before checkpoint resume meant cross-version slice(processedIndex) dropped the N newest files on the upgrade run.

Fix: src/core/import-checkpoint.ts stores a set of completed relative paths. A file is "done" only when processFile returns success. Failed files never enter the set. Sort order is irrelevant to checkpoint correctness.

Commits:

Test Coverage

[+] src/core/sort-newest-first.ts
  └── sortNewestFirst()
      └── [★★★ TESTED] date-prefixed + mixed + empty + single + in-place — test/sort-newest-first.test.ts (5 cases)

[+] src/core/import-checkpoint.ts
  ├── loadCheckpoint()
  │   ├── [★★★ TESTED] missing + malformed + dir-mismatch + old-positional-with-stderr-log + bad-shape + valid v2 — test/import-checkpoint.test.ts (7 cases)
  ├── saveCheckpoint()
  │   ├── [★★★ TESTED] round-trip + deterministic-sorted + atomic-rename + non-fatal-on-EACCES (4 cases)
  ├── resumeFilter()
  │   ├── [★★★ TESTED] empty + partial + absolute-normalization + mixed + full-match (5 cases)
  └── clearCheckpoint()
      └── [★★  TESTED] present + missing (2 cases)

[M] src/commands/import.ts (path-based integration)
  └── runImport()
      ├── [★★★ TESTED] old positional → discard + stderr — test/import-resume.test.ts
      ├── [★★★ TESTED] v0.33.2 checkpoint → resume honored (skips known-completed)
      ├── [★★★ TESTED] clean completion → checkpoint cleared
      ├── [★★★ TESTED] failed file (SLUG_MISMATCH) → NOT in completed → next run retries
      └── [★★★ TESTED] dir mismatch → silent discard (no migration log)

[M] src/commands/sync.ts
  └── performSyncInner() — sortNewestFirst swap, no checkpoint logic affected

COVERAGE: 23/23 new code paths tested (100%)
QUALITY: ★★★:22 ★★:1 ★:0 | GAPS: 0

Tests: 6378 → 6384 (+6 new — 5 sort-helper + 18 checkpoint-helper + 5 integration. Some merged into existing files.)

Parallel-import regression test deferred to E2E (filed as TODO): real-Postgres parallel-mode requires DATABASE_URL setup; the unit-level runImport integration tests pin the path-based contract under serial execution. Plan calls this out explicitly.

Pre-Landing Review

3 issues found via /plan-eng-review — all resolved through implementation, not skipped:

  • D1 (P1, conf 9/10): Checkpoint cross-version compatibility (the original sortVersion band-aid concern). Superseded by path-based design — sort order no longer matters to checkpoint correctness.
  • D2 (P2, conf 8/10): Zero test coverage. Closed — 28 new tests + integration coverage of runImport.
  • D3 (P1, conf 9/10): Codex pivot — parallel-import + failed-file pre-existing bugs that plan-eng-review missed. Closed by path-based design.

Outside Voice (Codex)

Ran codex exec against the draft plan. 5 substantive findings, all absorbed:

  • P1: sortVersion only fixes one bug — parallel-import positional model is broken regardless → adopted path-based
  • P1: failed files never retry under positional checkpoint → fixed by success-only completed.add
  • P2: completedFiles was a count, not a list → dropped that field; new field is completedPaths: string[]
  • P2: existing test duplicated predicate logic instead of testing production code → refactored to drive loadCheckpoint/saveCheckpoint/resumeFilter directly
  • P2: existing test wrote to real ~/.gbrain → wrapped every case in withEnv({ GBRAIN_HOME: tmpdir() })

Cross-model agreement: yes — codex strictly expanded coverage. No unresolved disagreement.

Plan Completion

Plan file: ~/.claude/plans/velvety-sparking-treehouse.md. All actionable items DONE:

  • src/core/sort-newest-first.ts created
  • src/core/import-checkpoint.ts created with full API
  • src/commands/import.ts refactored to path-based
  • src/commands/sync.ts swapped for helper
  • test/sort-newest-first.test.ts created
  • test/import-checkpoint.test.ts created
  • test/import-resume.test.ts refactored with GBRAIN_HOME isolation
  • All verification gates: bun run verify clean, bun run test 6384/0/0

Deferred (filed in plan as TODOs):

  • Append-only NDJSON sidecar checkpoint for >100K-file brains (perf optimization, not correctness — typical brain sizes are <50K)
  • Real-Postgres E2E case for parallel-import + mid-stream crash + resume

TODOS

No items in TODOS.md were completed by this PR. (Existing TODOs are all v0.33.x functional-area-resolver follow-ups, unrelated to import checkpoints.)

Documentation

  • CLAUDE.md — added Key Files entries for src/core/import-checkpoint.ts (new module: loadCheckpoint / saveCheckpoint / resumeFilter / clearCheckpoint, path-set checkpoint format, atomic .tmp+rename writes, GBRAIN_HOME-aware) and src/core/sort-newest-first.ts (single source of truth for the descending-lex sort shared by gbrain import and gbrain sync); new entry for src/commands/import.ts covering the v0.34.2.0 positional→path-set checkpoint refactor and the three bug classes it closes (parallel-slow-worker drop, failed-file-skip, sort-flip drop); updated src/commands/sync.ts entry to note the inline .sort() swap for sortNewestFirst(addsAndMods).
  • llms-full.txt — regenerated via bun run build:llms (mandatory chaser per CLAUDE.md auto-derived rule; CI shard 1 test/build-llms.test.ts enforces freshness).
  • README.md, TODOS.md — no updates needed (v0.34.2.0 is an internal bug fix with no new user-facing surface or TODO impact).

Test plan

  • bun run verify clean (typecheck + 12 shell guards, post-merge)
  • bun run test — 6384 pass / 0 fail / 0 skip, ~393s wall time (post-merge)
  • All 28 new unit cases pass
  • test/import-resume.test.ts PGLite integration cases pass (5/5)
  • No ~/.gbrain writes in tests (test-isolation lint clean)
  • E2E suite (bun run test:e2e) — skips gracefully without DATABASE_URL; existing sync E2E expected to pass since the sort change is positional, not semantic. Run pre-merge if shipping today.
  • Manual smoke: plant an old-positional checkpoint, run gbrain import, confirm "Older checkpoint format detected" stderr log fires.

To take advantage of v0.34.2.0

gbrain upgrade handles this automatically — no manual step. The first gbrain import after upgrade against a pre-existing checkpoint will log Older checkpoint format detected — re-walking (cheap via content_hash) and walk the brain fresh. Re-walk is cheap because content_hash short-circuits unchanged files.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

🤖 Generated with Claude Code

…tent

Problem: sync processes files in git-diff order (alphabetical), so
meetings/2020-* embeds before meetings/2026-*. After a burst of writes,
new pages can be invisible to search for hours while older pages process first.

Fix: sort addsAndMods descending in both incremental sync and full import.
Brain paths are date-prefixed by convention, so lexicographic descending
naturally prioritizes recent content.

This ensures the most relevant pages become searchable first.
garrytan and others added 4 commits May 14, 2026 19:15
Replace gbrain import's positional `processedIndex` checkpoint with a
path-set checkpoint via `src/core/import-checkpoint.ts`. A file is only
"done" when its processFile returns success — failed files never enter
the set, parallel workers can't lose slow files, and sort-order changes
don't drop the newest N files on resume.

Three bug classes fixed:
- Parallel import + slow worker = silent file drop on crash-resume
- Failed file = checkpoint advanced past it, never retried until manual clear
- Sort-order flip (v0.33.x) = cross-version resume drops newest N files

Old positional checkpoints are detected on first resume and discarded
with a stderr log line. Re-walking is cheap because content_hash
short-circuits unchanged files.

Also extracts the descending-lex sort into src/core/sort-newest-first.ts
so import.ts and sync.ts share a single source of truth.

Tests:
- test/sort-newest-first.test.ts (5 hermetic cases)
- test/import-checkpoint.test.ts (18 unit cases over the helpers)
- test/import-resume.test.ts (refactored — GBRAIN_HOME isolation,
  drives runImport against PGLite, 5 integration cases including
  SLUG_MISMATCH retry regression)

Includes the original sort-newest-first contribution from
@garrytan-agents's PR #964 (commit 8dbcf6a).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add CLAUDE.md Key Files entries for the path-based import checkpoint
work: new entries for src/core/import-checkpoint.ts and
src/core/sort-newest-first.ts, plus a dedicated src/commands/import.ts
entry covering the v0.34.2.0 refactor. Update src/commands/sync.ts
entry to reference sortNewestFirst. Regenerate llms-full.txt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@garrytan garrytan changed the title feat(sync): sort files newest-first for faster salience on recent content v0.34.2.0 fix(import): path-based checkpoint resume — kills parallel-drop + failed-file-skip + sort-flip bugs May 15, 2026
garrytan added 2 commits May 14, 2026 19:58
scripts/check-privacy.sh banlist includes /data/brain/ (legacy private
OpenClaw fork layout). New test files must not use it — CI privacy
guard caught this on PR #988's first push.

No behavior change. test/import-checkpoint.test.ts is unit-level with
no fs access; the dir string is just an identity marker for the
loadCheckpoint dir-mismatch guard.
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
@garrytan garrytan merged commit 3325b40 into master May 15, 2026
7 checks passed
garrytan added a commit that referenced this pull request May 15, 2026
Master shipped v0.34.2.0 (PR #988, path-based import checkpoint). v0.34.2.0
adds no new migrations (only new helper modules src/core/import-checkpoint.ts
and src/core/sort-newest-first.ts), so no migration renumber needed this
wave — our embed_stale_partial_index stays at v66.

Conflict resolutions:
- VERSION: take 0.34.4.0
- package.json: take 0.34.4.0
- CHANGELOG.md: collapse the interleaved auto-merge (master's v0.34.2.0
  entry got woven into mine on shared "### Itemized changes" / "#### Added"
  headers). Reconstructed both entries cleanly back-to-back: v0.34.4.0
  topmost, then v0.34.2.0, then v0.34.1.0.

Updated CHANGELOG references to migration v66 (was inadvertently still
"v60" in three spots from the prior wave's prose — corrected here).

Typecheck clean. lockfile unchanged (no dep delta).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
brandonlipman added a commit to brandonlipman/gbrain that referenced this pull request May 29, 2026
* upstream/master:
  v0.35.1.0: embedder shootout prereqs (pricing + gateway export + --resume-from) (garrytan#1055)
  v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker (garrytan#1008)
  v0.34.4.0 fix(embed): cursor-paginated --stale hardening wave (D2/D3/D4/D6/D7/D8 + regression test) (garrytan#991)
  v0.34.3.0 fix: supervisor treats code=0 watchdog exits as crashes (garrytan#1003)
  v0.34.2.0 fix(import): path-based checkpoint resume — kills parallel-drop + failed-file-skip + sort-flip bugs (garrytan#988)
  v0.34.1.0 fix(mcp): MCP fix wave — source-isolation P0 + PKCE DCR + federated_read + 3 more (garrytan#996)
  v0.34.0.0 feat: Cathedral III — recursive code intelligence + Leiden clusters + eval gate (garrytan#994)
  v0.33.3.0 feat(v0.33.3): code intelligence MCP foundation (v0.34 W0a-c + W3) (garrytan#934)
  v0.33.2.1 docs: fork-PR workflow for garrytan-agents (garrytan#992)
  fix(sync): raise maxBuffer to 100 MiB to prevent silent ENOBUFS crash (garrytan#982)
  v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting (garrytan#897)
  v0.33.1.1 fix: Voyage output_dimension + flexible-dim guard + OOM-cap rethrow (garrytan#962)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants