v0.22.9 feat: structured error code summary for sync --skip-failed by garrytan · Pull Request #501 · garrytan/gbrain

garrytan · 2026-04-28T15:41:49Z

Summary

Sync failures now tell you why, not just how many. gbrain sync --skip-failed and gbrain doctor group failures by error code instead of just printing a count.

Before: Sync blocked: 2688 file(s) failed to parse.
After: Sync blocked: 2688 file(s) failed to parse:\n SLUG_MISMATCH: 2685\n YAML_DUPLICATE_KEY: 3

On a real 81K-page brain, 2,685 files silently failed with SLUG_MISMATCH from a posterous import. The breakdown turns "panic" into "fix one root cause."

This PR ships the original feature (commit 4b5d29d) PLUS eng-review fixes (commit 52a2678) addressing 7 findings from /plan-eng-review + /codex outside-voice review.

Eng-review fixes (in this PR, post-original-commit)

P1 fixed: /duplicate.*key/i no longer mislabels Postgres duplicate key value violates unique constraint as YAML_DUPLICATE_KEY. New DB_DUPLICATE_KEY code, checked before YAML patterns.
P2 fixed: MISSING_OPEN / MISSING_CLOSE / EMPTY_FRONTMATTER regexes rewritten to match the canonical messages emitted by collectValidationErrors() in src/core/markdown.ts. The previous patterns (missing.*open, missing.*close) never fired in practice — the upstream throw site emits prose, not code names.
P2 documented: acknowledgeSyncFailures() shape change is a deliberate breaking change — CHANGELOG carries a ### For contributors migration block.
DRY: formatCodeBreakdown() helper extracted; collapses 3 inline builders in src/commands/sync.ts.
Test gap closed: legacy-entry backfill branch in acknowledgeSyncFailures() now has dedicated coverage.

Test Coverage

NEW CODE PATHS                                              TESTS
─────────────────────────────────────────────────────────────────────────────────────
classifyErrorCode() — 7 valid codes                         [★★★ TESTED]
classifyErrorCode() — UNKNOWN fallback                      [★★  TESTED]
classifyErrorCode() — DB_DUPLICATE_KEY (NEW, P1 fix)        [★★★ TESTED] (3 cases)
classifyErrorCode() — canonical-message coverage (P2 fix)   [★★★ TESTED] (7 cases)
summarizeFailuresByCode() — group/sort/empty/pre-classified [★★★ TESTED]
recordSyncFailures() — code field stored at write           [★★  TESTED]
acknowledgeSyncFailures() — returns {count, summary}        [★★  TESTED]
acknowledgeSyncFailures() — legacy-entry backfill (NEW)     [★★★ TESTED] (2 cases)
formatCodeBreakdown() — dual-input shape (NEW)              [★★★ TESTED] (3 cases)

COVERAGE: 100% of new/modified code paths tested

37/37 tests pass on test/sync-failures.test.ts. Full bun test shows same baseline as master pre-PR (133 fails / 99 errors are pre-existing DB-required E2E tests — none touch this PR's code).

Tests added: 22 → 37 (+15 net).

Pre-Landing Review

/plan-eng-review ran with /codex outside-voice. 7 findings, 7 addressed in this PR:

Finding	Source	Resolution
`/duplicate.*key/i` Postgres false-positive	Both reviewers	Fixed (DB_DUPLICATE_KEY checked first)
Dead regex patterns	Both reviewers	Fixed (canonical-message coverage)
`AcknowledgeResult` is breaking change	Both reviewers	Documented in CHANGELOG
Backfill branch untested	Both reviewers	Test added
Wrong-layer architecture	Both reviewers	TODO filed (P2 follow-up)
Concurrent-safe ack	Both reviewers	TODO filed (P3, pre-existing)
Most codes dead on sync path	Both reviewers	Trimmed via canonical-message rewrite

Plan file: ~/.claude/plans/then-codex-synchronous-toucan.md. No cross-model tension — both reviewers identified the same issues with the same severity ordering.

TODOS

3 follow-ups filed in TODOS.md under new ## sync error-code classification (PR #501 follow-ups) section:

P2 — Plumb structured ParseValidationCode through ImportResult (the architectural fix that deletes 50% of classifyErrorCode).
P0 at ship time — CHANGELOG ### For contributors migration note for the acknowledgeSyncFailures() shape change. Done in this PR's CHANGELOG.
P3 — Concurrent-safe ack of ~/.gbrain/sync-failures.jsonl (pre-existing).

Documentation

CHANGELOG entry added with full release-summary block (lead, "what this means," ### For contributors, itemized changes). VERSION + package.json bumped to v0.22.9. CLAUDE.md key-files annotations not updated — the new helpers (classifyErrorCode, summarizeFailuresByCode, formatCodeBreakdown) are well-localized in src/core/sync.ts; recommend running /document-release post-merge for a comprehensive sweep.

Notes for the maintainer

PR v0.22.12 feat: structured error code summary for sync --skip-failed (closes #500) #518 is a duplicate of this PR (also closes sync --skip-failed should emit structured summary by error code and persist skip manifest #500, claims v0.22.12). Recommend closing v0.22.12 feat: structured error code summary for sync --skip-failed (closes #500) #518 in favor of v0.22.9 feat: structured error code summary for sync --skip-failed #501.
Co-authored by Wintermute <wintermute@garrytan.com> (original commit) and Claude Opus 4.7 (1M context) (eng-review fixes commit).

Test plan

bun test test/sync-failures.test.ts — 37/37 pass
Full bun test — same baseline as master (no regressions from this PR)
E2E sync flow (manual): trigger SLUG_MISMATCH and confirm grouping renders
Verify Postgres duplicate key value violates unique constraint does NOT classify as YAML_DUPLICATE_KEY in production logs

🤖 Generated with Claude Code

When sync encounters per-file failures, the blocked/skip-failed messages now include a breakdown by error code (SLUG_MISMATCH, YAML_PARSE, etc.) instead of just a raw count. This makes it immediately obvious *why* files failed without requiring manual investigation. Changes: - Add classifyErrorCode() — maps error messages to ParseValidationCode - Add summarizeFailuresByCode() — groups failures into sorted code summary - SyncFailure now carries a 'code' field (backfilled on acknowledge) - acknowledgeSyncFailures() returns AcknowledgeResult {count, summary} - sync blocked + skip-failed messages show code breakdown - doctor sync_failures check shows code breakdown for both unacked and historical - 12 new tests for classifyErrorCode, summarizeFailuresByCode, and structured returns Before: Sync blocked: 2688 file(s) failed to parse. After: Sync blocked: 2688 file(s) failed to parse: SLUG_MISMATCH: 2685 YAML_DUPLICATE_KEY: 3 Closes #500

- Reorder classifyErrorCode() so DB-layer errors (DB_DUPLICATE_KEY, STATEMENT_TIMEOUT) check BEFORE YAML patterns. Postgres "duplicate key value violates unique constraint" no longer mislabels as YAML_DUPLICATE_KEY. - Rewrite MISSING_OPEN/MISSING_CLOSE/EMPTY_FRONTMATTER/NULL_BYTES/NESTED_QUOTES regexes to match the canonical messages emitted by collectValidationErrors() in src/core/markdown.ts. Previous patterns (e.g. /missing.*open/i) never fired because the upstream throw site emits prose ("File is empty...", "No closing --- delimiter found"), not the code name. - Extract formatCodeBreakdown() helper that accepts either raw failures or pre-summarized {code, count}[] input. Replaces 3 duplicate inline builders in src/commands/sync.ts. - 15 new tests (37/37 pass on test/sync-failures.test.ts): - DB vs YAML duplicate-key disambiguation (3 cases) - Canonical-message coverage for the 5 frontmatter codes (7 cases) - acknowledgeSyncFailures() legacy-entry backfill branch (2 cases) - formatCodeBreakdown() dual-input shape (3 cases) - TODOS.md: file 3 follow-ups (P2 plumb structured ParseValidationCode; P0-at-ship CHANGELOG migration note for AcknowledgeResult; P3 concurrent- safe ack of sync-failures.jsonl). Eng-review plan: ~/.claude/plans/then-codex-synchronous-toucan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ructured-summary

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The unit test suite ran 22m17s on ubuntu-latest (2-core/7GB) because: - 187 test files run with bun test parallelism bounded by core count - 23 of those files spin up a fresh PGLiteEngine + initSchema in beforeEach, paying ~22s WASM cold-start per test on the small runner This commit fixes the runner side: - runs-on: ubuntu-latest-16-cores (16 vCPU / 64 GB RAM) - strategy.matrix.shard splits 4 parallel jobs, each running ~40 of 158 unit test files. Single-file wall-time floor is ~3 min after the test refactor, so 4 shards × 16 cores hits the floor quickly without wasting cores past it. - pre-test gates (typecheck, check-jsonb, check-progress, check-wasm) only run on shard 1 — they're not test files and don't benefit from sharding. scripts/test-shard.sh partitions test files by stable FNV-1a hash mod N. Same file always lands in the same shard, so retries are reproducible. Pure shell, portable to bash 3.2 (macOS) and bash 5.x (CI). Excludes test/e2e/ which runs via bun run test:e2e separately and needs DATABASE_URL. Also: ignore .claude/ harness state files (scheduled_tasks.lock etc) instead of just .claude/skills/. Cost: ~$0.19/run vs $0 (public repo, default runner is free). At 50 PRs/month that's ~$10/month for ~5x faster CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three test files were spinning up a fresh PGLiteEngine + connect + initSchema in beforeEach. PGLite WASM cold-start is ~22s on the small CI runner; doing this per test multiplied wall-time across the suite. The 3 files alone accounted for ~6.5 min of the 22m CI run (177s + 132s + 87s). Refactor: move PGLite setup to beforeAll (one engine per file), wipe data in beforeEach via the new test/helpers/reset-pglite.ts helper. The reset helper: - TRUNCATEs every public table CASCADE, including sources (so tests that register their own sources don't leak rows into the next test). - Re-seeds the default source row that pages.source_id's DEFAULT FKs against. Without this, the next page insert would fail FK validation. - Preserves schema_version so migration helpers don't think the brain is on v0. Files refactored: - test/extract-incremental.test.ts (8 tests, was 177s on CI) - test/brain-writer.test.ts (16 tests; only the scanBrainSources block uses PGLite, was 132s on CI) - test/sync.test.ts (37 tests; only the performSync dry-run block uses PGLite, was 87s on CI) All 61 tests still pass locally. The remaining 20 PGLite-heavy files use the same beforeEach anti-pattern; this commit only refactors the proven worst offenders. Sweep the rest in a follow-up if CI numbers indicate it's worth it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The ubuntu-latest-16-cores label requires a provisioned larger-runner pool in repo/org settings. Without that setup, jobs queue indefinitely waiting for a runner that doesn't exist (verified: 4 shards stuck in 'queued' status with empty runner_name for 5+ min). Drop back to the default 2-core ubuntu-latest. The 4-way matrix shard still delivers ~5-6x speedup via parallelism alone — 4 jobs running in parallel, each handling ~40 of 158 unit test files. Cost stays $0 (default runner is free for public repos). If we ever provision a larger-runner pool, flip this label back to ubuntu-latest-16-cores. The matrix + sharder will use the bigger boxes unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…arrytan#501) * feat: structured error code summary for sync --skip-failed (garrytan#500) When sync encounters per-file failures, the blocked/skip-failed messages now include a breakdown by error code (SLUG_MISMATCH, YAML_PARSE, etc.) instead of just a raw count. This makes it immediately obvious *why* files failed without requiring manual investigation. Changes: - Add classifyErrorCode() — maps error messages to ParseValidationCode - Add summarizeFailuresByCode() — groups failures into sorted code summary - SyncFailure now carries a 'code' field (backfilled on acknowledge) - acknowledgeSyncFailures() returns AcknowledgeResult {count, summary} - sync blocked + skip-failed messages show code breakdown - doctor sync_failures check shows code breakdown for both unacked and historical - 12 new tests for classifyErrorCode, summarizeFailuresByCode, and structured returns Before: Sync blocked: 2688 file(s) failed to parse. After: Sync blocked: 2688 file(s) failed to parse: SLUG_MISMATCH: 2685 YAML_DUPLICATE_KEY: 3 Closes garrytan#500 * fix: eng-review fixes for sync error-code classification - Reorder classifyErrorCode() so DB-layer errors (DB_DUPLICATE_KEY, STATEMENT_TIMEOUT) check BEFORE YAML patterns. Postgres "duplicate key value violates unique constraint" no longer mislabels as YAML_DUPLICATE_KEY. - Rewrite MISSING_OPEN/MISSING_CLOSE/EMPTY_FRONTMATTER/NULL_BYTES/NESTED_QUOTES regexes to match the canonical messages emitted by collectValidationErrors() in src/core/markdown.ts. Previous patterns (e.g. /missing.*open/i) never fired because the upstream throw site emits prose ("File is empty...", "No closing --- delimiter found"), not the code name. - Extract formatCodeBreakdown() helper that accepts either raw failures or pre-summarized {code, count}[] input. Replaces 3 duplicate inline builders in src/commands/sync.ts. - 15 new tests (37/37 pass on test/sync-failures.test.ts): - DB vs YAML duplicate-key disambiguation (3 cases) - Canonical-message coverage for the 5 frontmatter codes (7 cases) - acknowledgeSyncFailures() legacy-entry backfill branch (2 cases) - formatCodeBreakdown() dual-input shape (3 cases) - TODOS.md: file 3 follow-ups (P2 plumb structured ParseValidationCode; P0-at-ship CHANGELOG migration note for AcknowledgeResult; P3 concurrent- safe ack of sync-failures.jsonl). Eng-review plan: ~/.claude/plans/then-codex-synchronous-toucan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.22.9) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: 16-core runner + 4-way matrix shard for test job The unit test suite ran 22m17s on ubuntu-latest (2-core/7GB) because: - 187 test files run with bun test parallelism bounded by core count - 23 of those files spin up a fresh PGLiteEngine + initSchema in beforeEach, paying ~22s WASM cold-start per test on the small runner This commit fixes the runner side: - runs-on: ubuntu-latest-16-cores (16 vCPU / 64 GB RAM) - strategy.matrix.shard splits 4 parallel jobs, each running ~40 of 158 unit test files. Single-file wall-time floor is ~3 min after the test refactor, so 4 shards × 16 cores hits the floor quickly without wasting cores past it. - pre-test gates (typecheck, check-jsonb, check-progress, check-wasm) only run on shard 1 — they're not test files and don't benefit from sharding. scripts/test-shard.sh partitions test files by stable FNV-1a hash mod N. Same file always lands in the same shard, so retries are reproducible. Pure shell, portable to bash 3.2 (macOS) and bash 5.x (CI). Excludes test/e2e/ which runs via bun run test:e2e separately and needs DATABASE_URL. Also: ignore .claude/ harness state files (scheduled_tasks.lock etc) instead of just .claude/skills/. Cost: ~$0.19/run vs $0 (public repo, default runner is free). At 50 PRs/month that's ~$10/month for ~5x faster CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: refactor top-3 PGLite-heavy files to share one engine per file Three test files were spinning up a fresh PGLiteEngine + connect + initSchema in beforeEach. PGLite WASM cold-start is ~22s on the small CI runner; doing this per test multiplied wall-time across the suite. The 3 files alone accounted for ~6.5 min of the 22m CI run (177s + 132s + 87s). Refactor: move PGLite setup to beforeAll (one engine per file), wipe data in beforeEach via the new test/helpers/reset-pglite.ts helper. The reset helper: - TRUNCATEs every public table CASCADE, including sources (so tests that register their own sources don't leak rows into the next test). - Re-seeds the default source row that pages.source_id's DEFAULT FKs against. Without this, the next page insert would fail FK validation. - Preserves schema_version so migration helpers don't think the brain is on v0. Files refactored: - test/extract-incremental.test.ts (8 tests, was 177s on CI) - test/brain-writer.test.ts (16 tests; only the scanBrainSources block uses PGLite, was 132s on CI) - test/sync.test.ts (37 tests; only the performSync dry-run block uses PGLite, was 87s on CI) All 61 tests still pass locally. The remaining 20 PGLite-heavy files use the same beforeEach anti-pattern; this commit only refactors the proven worst offenders. Sweep the rest in a follow-up if CI numbers indicate it's worth it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: fall back to ubuntu-latest for matrix shard The ubuntu-latest-16-cores label requires a provisioned larger-runner pool in repo/org settings. Without that setup, jobs queue indefinitely waiting for a runner that doesn't exist (verified: 4 shards stuck in 'queued' status with empty runner_name for 5+ min). Drop back to the default 2-core ubuntu-latest. The 4-way matrix shard still delivers ~5-6x speedup via parallelism alone — 4 jobs running in parallel, each handling ~40 of 158 unit test files. Cost stays $0 (default runner is free for public repos). If we ever provision a larger-runner pool, flip this label back to ubuntu-latest-16-cores. The matrix + sharder will use the bigger boxes unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Wintermute <wintermute@garrytan.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@WinterMute

…loses #500) (#518) * feat: structured error code summary for sync --skip-failed (#500) When sync encounters per-file failures, the blocked/skip-failed messages now include a breakdown by error code (SLUG_MISMATCH, YAML_PARSE, etc.) instead of just a raw count. This makes it immediately obvious *why* files failed without requiring manual investigation. Changes: - Add classifyErrorCode() — maps error messages to ParseValidationCode - Add summarizeFailuresByCode() — groups failures into sorted code summary - SyncFailure now carries a 'code' field (backfilled on acknowledge) - acknowledgeSyncFailures() returns AcknowledgeResult {count, summary} - sync blocked + skip-failed messages show code breakdown - doctor sync_failures check shows code breakdown for both unacked and historical - 12 new tests for classifyErrorCode, summarizeFailuresByCode, and structured returns Before: Sync blocked: 2688 file(s) failed to parse. After: Sync blocked: 2688 file(s) failed to parse: SLUG_MISMATCH: 2685 YAML_DUPLICATE_KEY: 3 Closes #500 * test(sync): broaden classifier regexes and pin coverage with 6 new unit tests Eng review of PR #501 found two ship-blocking gaps in the classifier: 1. Four real production error sites in src/core/import-file.ts emit strings that bucketed to UNKNOWN — exactly the silent-systemic-failure pattern that motivated #500 in the first place. Add two regex lines: FILE_TOO_LARGE — covers import-file.ts:199, 352, 401 SYMLINK_NOT_ALLOWED — covers import-file.ts:347 2. Three existing classifier regexes (MISSING_OPEN, MISSING_CLOSE, EMPTY_FRONTMATTER) only matched the literal code-name prefix. The actual message strings emitted by markdown.ts:159-244 (e.g. "Frontmatter must start with --- on the first non-empty line") wouldn't match. Broaden each to match production message text. NESTED_QUOTES already worked. Add 6 unit tests pinning the contract between markdown.ts/import-file.ts strings and the classifier regex set. If anyone reworks a validator message, both sides have to move together — the test fails loudly otherwise. Test count: 22 → 28 in test/sync-failures.test.ts, all green. * test(e2e): add failure-loop E2E for sync --skip-failed (issue #500 ship-blocker) The full code path (record → classify → block → skip → doctor render → second cycle) had only mocked-JSONL unit coverage. For a hotfix that changes user-visible CLI output and the doctor surface, that's thin. One comprehensive E2E test covers the loop: 1. First sync of clean repo — succeeds, bookmark advances 2. Add file with bad slug — sync returns 'blocked_by_failures', bookmark stays put, JSONL has 1 unacked entry coded SLUG_MISMATCH 3. --skip-failed — bookmark advances past the bad commit, entry transitions to acknowledged, AcknowledgeResult.summary aggregates 4. Second broken file (different path, same code) — sync blocks again, 1 acked + 1 unacked, dedup honors path identity 5. --skip-failed again — both acked, summary correctly counts 2 Hermetic on a developer machine: saves ~/.gbrain/sync-failures.jsonl before the test, restores it after. Doctor rendering verified by calling the same primitives doctor.ts uses (loadSyncFailures + summarizeFailuresByCode) rather than runDoctor() — runDoctor is a CLI entrypoint with stdout/exit side effects that truncate the test mid-flow. E2E count: 13 → 14 in test/e2e/sync.test.ts. All 14 pass under real Postgres + pgvector (gbrain-test-pg/pgvector:pg16). * v0.22.12: structured error code summary for sync --skip-failed Closes issue #500. PR #501 by @WinterMute is the foundation (cherry-picked as c356ea4 — classifier, doctor breakdown, AcknowledgeResult shape, 12 unit tests). This release adds: - Classifier coverage for FILE_TOO_LARGE + SYMLINK_NOT_ALLOWED (the four size/symlink rejection sites in import-file.ts that bucketed to UNKNOWN). - Three regex breadths (MISSING_OPEN, MISSING_CLOSE, EMPTY_FRONTMATTER) matching actual markdown.ts validator messages, not just the literal code-name prefix. - 6 new unit tests pinning literal production strings. - 1 comprehensive E2E test exercising the full failure loop. Total v0.22.12 diff: ~340 lines on top of PR #501. Backward-compatible — pre-v0.22.12 JSONL entries get classified at acknowledge time. * chore: regenerate llms-full.txt for v0.22.12 CLAUDE.md changes CI regen-drift guard caught that llms-full.txt was stale after the v0.22.12 CLAUDE.md annotation updates (sync.ts, doctor.ts, sync-failures.test.ts, e2e/sync.test.ts entries). Per CLAUDE.md "Auto-derived" rule: run `bun run build:llms` after any release ship that touches Key Files annotations. The bundle reflects current docs state. llms.txt unchanged (curated index doesn't index those entries). llms-full.txt: 308192 bytes. test/build-llms.test.ts now passes 7/7 (was 6/7 in CI). --------- Co-authored-by: Wintermute <wintermute@garrytan.com>

garrytan mentioned this pull request Apr 29, 2026

v0.22.12 feat: structured error code summary for sync --skip-failed (closes #500) #518

Merged

13 tasks

garrytan and others added 3 commits April 29, 2026 08:48

Merge remote-tracking branch 'origin/master' into feat/skip-failed-st…

a5a98f0

…ructured-summary

chore: bump version and changelog (v0.22.9)

aa8fedd

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan changed the title ~~feat: structured error code summary for sync --skip-failed~~ v0.22.9 feat: structured error code summary for sync --skip-failed Apr 29, 2026

garrytan and others added 3 commits April 29, 2026 09:51

garrytan merged commit 08746b0 into master Apr 29, 2026
7 checks passed

vinsew mentioned this pull request May 3, 2026

fix(sync): reconcile sync-failures.jsonl with reality #479

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.22.9 feat: structured error code summary for sync --skip-failed#501

v0.22.9 feat: structured error code summary for sync --skip-failed#501
garrytan merged 7 commits into
masterfrom
feat/skip-failed-structured-summary

garrytan commented Apr 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Eng-review fixes (in this PR, post-original-commit)

Test Coverage

Pre-Landing Review

TODOS

Documentation

Notes for the maintainer

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garrytan commented Apr 28, 2026 •

edited

Loading