feat(sync): sort files newest-first for faster salience on recent content#964
Closed
garrytan-agents wants to merge 1 commit into
Closed
feat(sync): sort files newest-first for faster salience on recent content#964garrytan-agents wants to merge 1 commit into
garrytan-agents wants to merge 1 commit into
Conversation
…tent Problem: sync processes files in git-diff order (alphabetical), so meetings/2020-* embeds before meetings/2026-*. After a burst of writes, new pages can be invisible to search for hours while older pages process first. Fix: sort addsAndMods descending in both incremental sync and full import. Brain paths are date-prefixed by convention, so lexicographic descending naturally prioritizes recent content. This ensures the most relevant pages become searchable first.
15 tasks
Owner
|
Cherry-picked into #988 on the garrytan/bangalore-v4 branch so CI runs fully (garrytan-agents PRs don't get full CI). Closing this PR; continuing review/merge there. |
garrytan
added a commit
that referenced
this pull request
May 15, 2026
Replace gbrain import's positional `processedIndex` checkpoint with a path-set checkpoint via `src/core/import-checkpoint.ts`. A file is only "done" when its processFile returns success — failed files never enter the set, parallel workers can't lose slow files, and sort-order changes don't drop the newest N files on resume. Three bug classes fixed: - Parallel import + slow worker = silent file drop on crash-resume - Failed file = checkpoint advanced past it, never retried until manual clear - Sort-order flip (v0.33.x) = cross-version resume drops newest N files Old positional checkpoints are detected on first resume and discarded with a stderr log line. Re-walking is cheap because content_hash short-circuits unchanged files. Also extracts the descending-lex sort into src/core/sort-newest-first.ts so import.ts and sync.ts share a single source of truth. Tests: - test/sort-newest-first.test.ts (5 hermetic cases) - test/import-checkpoint.test.ts (18 unit cases over the helpers) - test/import-resume.test.ts (refactored — GBRAIN_HOME isolation, drives runImport against PGLite, 5 integration cases including SLUG_MISMATCH retry regression) Includes the original sort-newest-first contribution from @garrytan-agents's PR #964 (commit 8dbcf6a).
garrytan
added a commit
that referenced
this pull request
May 15, 2026
…drop + failed-file-skip + sort-flip bugs (#988) * feat(sync): sort files newest-first for faster salience on recent content Problem: sync processes files in git-diff order (alphabetical), so meetings/2020-* embeds before meetings/2026-*. After a burst of writes, new pages can be invisible to search for hours while older pages process first. Fix: sort addsAndMods descending in both incremental sync and full import. Brain paths are date-prefixed by convention, so lexicographic descending naturally prioritizes recent content. This ensures the most relevant pages become searchable first. * feat(import): path-based checkpoint resume + sort-newest-first helper Replace gbrain import's positional `processedIndex` checkpoint with a path-set checkpoint via `src/core/import-checkpoint.ts`. A file is only "done" when its processFile returns success — failed files never enter the set, parallel workers can't lose slow files, and sort-order changes don't drop the newest N files on resume. Three bug classes fixed: - Parallel import + slow worker = silent file drop on crash-resume - Failed file = checkpoint advanced past it, never retried until manual clear - Sort-order flip (v0.33.x) = cross-version resume drops newest N files Old positional checkpoints are detected on first resume and discarded with a stderr log line. Re-walking is cheap because content_hash short-circuits unchanged files. Also extracts the descending-lex sort into src/core/sort-newest-first.ts so import.ts and sync.ts share a single source of truth. Tests: - test/sort-newest-first.test.ts (5 hermetic cases) - test/import-checkpoint.test.ts (18 unit cases over the helpers) - test/import-resume.test.ts (refactored — GBRAIN_HOME isolation, drives runImport against PGLite, 5 integration cases including SLUG_MISMATCH retry regression) Includes the original sort-newest-first contribution from @garrytan-agents's PR #964 (commit 8dbcf6a). * chore: bump version and changelog (v0.34.2.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: update project documentation for v0.34.2.0 Add CLAUDE.md Key Files entries for the path-based import checkpoint work: new entries for src/core/import-checkpoint.ts and src/core/sort-newest-first.ts, plus a dedicated src/commands/import.ts entry covering the v0.34.2.0 refactor. Update src/commands/sync.ts entry to reference sortNewestFirst. Regenerate llms-full.txt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(tests): swap banned /data/brain placeholder for /tmp/example-brain scripts/check-privacy.sh banlist includes /data/brain/ (legacy private OpenClaw fork layout). New test files must not use it — CI privacy guard caught this on PR #988's first push. No behavior change. test/import-checkpoint.test.ts is unit-level with no fs access; the dir string is just an identity marker for the loadCheckpoint dir-mismatch guard. --------- Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Sync processes files in
git difforder (alphabetical), someetings/2020-*gets embedded beforemeetings/2026-*. After a burst of writes, new pages can be invisible to search for hours while older pages in the alphabet process first.Real-world impact: divorce attorney consultation pages written on May 11 were never found by gbrain search because the sync lock got stuck, and when sync finally ran, it would have processed them last (alphabetically after thousands of older pages).
Fix
Sort
addsAndModsdescending in both:sync.tsline 690) — the git-diff pathimport.ts) — therunImportwalkerBrain paths are date-prefixed by convention (
meetings/2026-05-13-*,daily/2026-05-13.md), so lexicographic descending naturally prioritizes recent content.Impact
.sort()calls), rest is commentsTesting
Need help on this PR? Tag
@codesmithwith what you need.