Skip to content

v0.22.9 feat: structured error code summary for sync --skip-failed#501

Merged
garrytan merged 7 commits into
masterfrom
feat/skip-failed-structured-summary
Apr 29, 2026
Merged

v0.22.9 feat: structured error code summary for sync --skip-failed#501
garrytan merged 7 commits into
masterfrom
feat/skip-failed-structured-summary

Conversation

@garrytan

@garrytan garrytan commented Apr 28, 2026

Copy link
Copy Markdown
Owner

Summary

Sync failures now tell you why, not just how many. gbrain sync --skip-failed and gbrain doctor group failures by error code instead of just printing a count.

Before: Sync blocked: 2688 file(s) failed to parse.
After: Sync blocked: 2688 file(s) failed to parse:\n SLUG_MISMATCH: 2685\n YAML_DUPLICATE_KEY: 3

On a real 81K-page brain, 2,685 files silently failed with SLUG_MISMATCH from a posterous import. The breakdown turns "panic" into "fix one root cause."

This PR ships the original feature (commit 4b5d29d) PLUS eng-review fixes (commit 52a2678) addressing 7 findings from /plan-eng-review + /codex outside-voice review.

Eng-review fixes (in this PR, post-original-commit)

  • P1 fixed: /duplicate.*key/i no longer mislabels Postgres duplicate key value violates unique constraint as YAML_DUPLICATE_KEY. New DB_DUPLICATE_KEY code, checked before YAML patterns.
  • P2 fixed: MISSING_OPEN / MISSING_CLOSE / EMPTY_FRONTMATTER regexes rewritten to match the canonical messages emitted by collectValidationErrors() in src/core/markdown.ts. The previous patterns (missing.*open, missing.*close) never fired in practice — the upstream throw site emits prose, not code names.
  • P2 documented: acknowledgeSyncFailures() shape change is a deliberate breaking change — CHANGELOG carries a ### For contributors migration block.
  • DRY: formatCodeBreakdown() helper extracted; collapses 3 inline builders in src/commands/sync.ts.
  • Test gap closed: legacy-entry backfill branch in acknowledgeSyncFailures() now has dedicated coverage.

Test Coverage

NEW CODE PATHS                                              TESTS
─────────────────────────────────────────────────────────────────────────────────────
classifyErrorCode() — 7 valid codes                         [★★★ TESTED]
classifyErrorCode() — UNKNOWN fallback                      [★★  TESTED]
classifyErrorCode() — DB_DUPLICATE_KEY (NEW, P1 fix)        [★★★ TESTED] (3 cases)
classifyErrorCode() — canonical-message coverage (P2 fix)   [★★★ TESTED] (7 cases)
summarizeFailuresByCode() — group/sort/empty/pre-classified [★★★ TESTED]
recordSyncFailures() — code field stored at write           [★★  TESTED]
acknowledgeSyncFailures() — returns {count, summary}        [★★  TESTED]
acknowledgeSyncFailures() — legacy-entry backfill (NEW)     [★★★ TESTED] (2 cases)
formatCodeBreakdown() — dual-input shape (NEW)              [★★★ TESTED] (3 cases)

COVERAGE: 100% of new/modified code paths tested

37/37 tests pass on test/sync-failures.test.ts. Full bun test shows same baseline as master pre-PR (133 fails / 99 errors are pre-existing DB-required E2E tests — none touch this PR's code).

Tests added: 22 → 37 (+15 net).

Pre-Landing Review

/plan-eng-review ran with /codex outside-voice. 7 findings, 7 addressed in this PR:

Finding Source Resolution
/duplicate.*key/i Postgres false-positive Both reviewers Fixed (DB_DUPLICATE_KEY checked first)
Dead regex patterns Both reviewers Fixed (canonical-message coverage)
AcknowledgeResult is breaking change Both reviewers Documented in CHANGELOG
Backfill branch untested Both reviewers Test added
Wrong-layer architecture Both reviewers TODO filed (P2 follow-up)
Concurrent-safe ack Both reviewers TODO filed (P3, pre-existing)
Most codes dead on sync path Both reviewers Trimmed via canonical-message rewrite

Plan file: ~/.claude/plans/then-codex-synchronous-toucan.md. No cross-model tension — both reviewers identified the same issues with the same severity ordering.

TODOS

3 follow-ups filed in TODOS.md under new ## sync error-code classification (PR #501 follow-ups) section:

  1. P2 — Plumb structured ParseValidationCode through ImportResult (the architectural fix that deletes 50% of classifyErrorCode).
  2. P0 at ship time — CHANGELOG ### For contributors migration note for the acknowledgeSyncFailures() shape change. Done in this PR's CHANGELOG.
  3. P3 — Concurrent-safe ack of ~/.gbrain/sync-failures.jsonl (pre-existing).

Documentation

CHANGELOG entry added with full release-summary block (lead, "what this means," ### For contributors, itemized changes). VERSION + package.json bumped to v0.22.9. CLAUDE.md key-files annotations not updated — the new helpers (classifyErrorCode, summarizeFailuresByCode, formatCodeBreakdown) are well-localized in src/core/sync.ts; recommend running /document-release post-merge for a comprehensive sweep.

Notes for the maintainer

Test plan

  • bun test test/sync-failures.test.ts — 37/37 pass
  • Full bun test — same baseline as master (no regressions from this PR)
  • E2E sync flow (manual): trigger SLUG_MISMATCH and confirm grouping renders
  • Verify Postgres duplicate key value violates unique constraint does NOT classify as YAML_DUPLICATE_KEY in production logs

🤖 Generated with Claude Code

When sync encounters per-file failures, the blocked/skip-failed messages
now include a breakdown by error code (SLUG_MISMATCH, YAML_PARSE, etc.)
instead of just a raw count. This makes it immediately obvious *why*
files failed without requiring manual investigation.

Changes:
- Add classifyErrorCode() — maps error messages to ParseValidationCode
- Add summarizeFailuresByCode() — groups failures into sorted code summary
- SyncFailure now carries a 'code' field (backfilled on acknowledge)
- acknowledgeSyncFailures() returns AcknowledgeResult {count, summary}
- sync blocked + skip-failed messages show code breakdown
- doctor sync_failures check shows code breakdown for both unacked and historical
- 12 new tests for classifyErrorCode, summarizeFailuresByCode, and structured returns

Before:
  Sync blocked: 2688 file(s) failed to parse.

After:
  Sync blocked: 2688 file(s) failed to parse:
    SLUG_MISMATCH: 2685
    YAML_DUPLICATE_KEY: 3

Closes #500
garrytan and others added 3 commits April 29, 2026 08:48
- Reorder classifyErrorCode() so DB-layer errors (DB_DUPLICATE_KEY,
  STATEMENT_TIMEOUT) check BEFORE YAML patterns. Postgres "duplicate key
  value violates unique constraint" no longer mislabels as YAML_DUPLICATE_KEY.
- Rewrite MISSING_OPEN/MISSING_CLOSE/EMPTY_FRONTMATTER/NULL_BYTES/NESTED_QUOTES
  regexes to match the canonical messages emitted by collectValidationErrors()
  in src/core/markdown.ts. Previous patterns (e.g. /missing.*open/i) never
  fired because the upstream throw site emits prose ("File is empty...",
  "No closing --- delimiter found"), not the code name.
- Extract formatCodeBreakdown() helper that accepts either raw failures or
  pre-summarized {code, count}[] input. Replaces 3 duplicate inline builders
  in src/commands/sync.ts.
- 15 new tests (37/37 pass on test/sync-failures.test.ts):
  - DB vs YAML duplicate-key disambiguation (3 cases)
  - Canonical-message coverage for the 5 frontmatter codes (7 cases)
  - acknowledgeSyncFailures() legacy-entry backfill branch (2 cases)
  - formatCodeBreakdown() dual-input shape (3 cases)
- TODOS.md: file 3 follow-ups (P2 plumb structured ParseValidationCode;
  P0-at-ship CHANGELOG migration note for AcknowledgeResult; P3 concurrent-
  safe ack of sync-failures.jsonl).

Eng-review plan: ~/.claude/plans/then-codex-synchronous-toucan.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title feat: structured error code summary for sync --skip-failed v0.22.9 feat: structured error code summary for sync --skip-failed Apr 29, 2026
garrytan and others added 3 commits April 29, 2026 09:51
The unit test suite ran 22m17s on ubuntu-latest (2-core/7GB) because:
- 187 test files run with bun test parallelism bounded by core count
- 23 of those files spin up a fresh PGLiteEngine + initSchema in beforeEach,
  paying ~22s WASM cold-start per test on the small runner

This commit fixes the runner side:
- runs-on: ubuntu-latest-16-cores (16 vCPU / 64 GB RAM)
- strategy.matrix.shard splits 4 parallel jobs, each running ~40 of 158 unit
  test files. Single-file wall-time floor is ~3 min after the test refactor,
  so 4 shards × 16 cores hits the floor quickly without wasting cores past it.
- pre-test gates (typecheck, check-jsonb, check-progress, check-wasm) only run
  on shard 1 — they're not test files and don't benefit from sharding.

scripts/test-shard.sh partitions test files by stable FNV-1a hash mod N. Same
file always lands in the same shard, so retries are reproducible. Pure shell,
portable to bash 3.2 (macOS) and bash 5.x (CI). Excludes test/e2e/ which runs
via bun run test:e2e separately and needs DATABASE_URL.

Also: ignore .claude/ harness state files (scheduled_tasks.lock etc) instead
of just .claude/skills/.

Cost: ~$0.19/run vs $0 (public repo, default runner is free). At 50 PRs/month
that's ~$10/month for ~5x faster CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three test files were spinning up a fresh PGLiteEngine + connect + initSchema
in beforeEach. PGLite WASM cold-start is ~22s on the small CI runner; doing
this per test multiplied wall-time across the suite. The 3 files alone
accounted for ~6.5 min of the 22m CI run (177s + 132s + 87s).

Refactor: move PGLite setup to beforeAll (one engine per file), wipe data
in beforeEach via the new test/helpers/reset-pglite.ts helper.

The reset helper:
- TRUNCATEs every public table CASCADE, including sources (so tests that
  register their own sources don't leak rows into the next test).
- Re-seeds the default source row that pages.source_id's DEFAULT FKs against.
  Without this, the next page insert would fail FK validation.
- Preserves schema_version so migration helpers don't think the brain is on v0.

Files refactored:
- test/extract-incremental.test.ts (8 tests, was 177s on CI)
- test/brain-writer.test.ts (16 tests; only the scanBrainSources block uses
  PGLite, was 132s on CI)
- test/sync.test.ts (37 tests; only the performSync dry-run block uses PGLite,
  was 87s on CI)

All 61 tests still pass locally. The remaining 20 PGLite-heavy files use the
same beforeEach anti-pattern; this commit only refactors the proven worst
offenders. Sweep the rest in a follow-up if CI numbers indicate it's worth it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ubuntu-latest-16-cores label requires a provisioned larger-runner pool in
repo/org settings. Without that setup, jobs queue indefinitely waiting for a
runner that doesn't exist (verified: 4 shards stuck in 'queued' status with
empty runner_name for 5+ min).

Drop back to the default 2-core ubuntu-latest. The 4-way matrix shard still
delivers ~5-6x speedup via parallelism alone — 4 jobs running in parallel,
each handling ~40 of 158 unit test files. Cost stays $0 (default runner is
free for public repos).

If we ever provision a larger-runner pool, flip this label back to
ubuntu-latest-16-cores. The matrix + sharder will use the bigger boxes
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit 08746b0 into master Apr 29, 2026
7 checks passed
ChenyqThu pushed a commit to ChenyqThu/jarvis-knowledge-os-v2 that referenced this pull request Apr 30, 2026
…arrytan#501)

* feat: structured error code summary for sync --skip-failed (garrytan#500)

When sync encounters per-file failures, the blocked/skip-failed messages
now include a breakdown by error code (SLUG_MISMATCH, YAML_PARSE, etc.)
instead of just a raw count. This makes it immediately obvious *why*
files failed without requiring manual investigation.

Changes:
- Add classifyErrorCode() — maps error messages to ParseValidationCode
- Add summarizeFailuresByCode() — groups failures into sorted code summary
- SyncFailure now carries a 'code' field (backfilled on acknowledge)
- acknowledgeSyncFailures() returns AcknowledgeResult {count, summary}
- sync blocked + skip-failed messages show code breakdown
- doctor sync_failures check shows code breakdown for both unacked and historical
- 12 new tests for classifyErrorCode, summarizeFailuresByCode, and structured returns

Before:
  Sync blocked: 2688 file(s) failed to parse.

After:
  Sync blocked: 2688 file(s) failed to parse:
    SLUG_MISMATCH: 2685
    YAML_DUPLICATE_KEY: 3

Closes garrytan#500

* fix: eng-review fixes for sync error-code classification

- Reorder classifyErrorCode() so DB-layer errors (DB_DUPLICATE_KEY,
  STATEMENT_TIMEOUT) check BEFORE YAML patterns. Postgres "duplicate key
  value violates unique constraint" no longer mislabels as YAML_DUPLICATE_KEY.
- Rewrite MISSING_OPEN/MISSING_CLOSE/EMPTY_FRONTMATTER/NULL_BYTES/NESTED_QUOTES
  regexes to match the canonical messages emitted by collectValidationErrors()
  in src/core/markdown.ts. Previous patterns (e.g. /missing.*open/i) never
  fired because the upstream throw site emits prose ("File is empty...",
  "No closing --- delimiter found"), not the code name.
- Extract formatCodeBreakdown() helper that accepts either raw failures or
  pre-summarized {code, count}[] input. Replaces 3 duplicate inline builders
  in src/commands/sync.ts.
- 15 new tests (37/37 pass on test/sync-failures.test.ts):
  - DB vs YAML duplicate-key disambiguation (3 cases)
  - Canonical-message coverage for the 5 frontmatter codes (7 cases)
  - acknowledgeSyncFailures() legacy-entry backfill branch (2 cases)
  - formatCodeBreakdown() dual-input shape (3 cases)
- TODOS.md: file 3 follow-ups (P2 plumb structured ParseValidationCode;
  P0-at-ship CHANGELOG migration note for AcknowledgeResult; P3 concurrent-
  safe ack of sync-failures.jsonl).

Eng-review plan: ~/.claude/plans/then-codex-synchronous-toucan.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.22.9)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: 16-core runner + 4-way matrix shard for test job

The unit test suite ran 22m17s on ubuntu-latest (2-core/7GB) because:
- 187 test files run with bun test parallelism bounded by core count
- 23 of those files spin up a fresh PGLiteEngine + initSchema in beforeEach,
  paying ~22s WASM cold-start per test on the small runner

This commit fixes the runner side:
- runs-on: ubuntu-latest-16-cores (16 vCPU / 64 GB RAM)
- strategy.matrix.shard splits 4 parallel jobs, each running ~40 of 158 unit
  test files. Single-file wall-time floor is ~3 min after the test refactor,
  so 4 shards × 16 cores hits the floor quickly without wasting cores past it.
- pre-test gates (typecheck, check-jsonb, check-progress, check-wasm) only run
  on shard 1 — they're not test files and don't benefit from sharding.

scripts/test-shard.sh partitions test files by stable FNV-1a hash mod N. Same
file always lands in the same shard, so retries are reproducible. Pure shell,
portable to bash 3.2 (macOS) and bash 5.x (CI). Excludes test/e2e/ which runs
via bun run test:e2e separately and needs DATABASE_URL.

Also: ignore .claude/ harness state files (scheduled_tasks.lock etc) instead
of just .claude/skills/.

Cost: ~$0.19/run vs $0 (public repo, default runner is free). At 50 PRs/month
that's ~$10/month for ~5x faster CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: refactor top-3 PGLite-heavy files to share one engine per file

Three test files were spinning up a fresh PGLiteEngine + connect + initSchema
in beforeEach. PGLite WASM cold-start is ~22s on the small CI runner; doing
this per test multiplied wall-time across the suite. The 3 files alone
accounted for ~6.5 min of the 22m CI run (177s + 132s + 87s).

Refactor: move PGLite setup to beforeAll (one engine per file), wipe data
in beforeEach via the new test/helpers/reset-pglite.ts helper.

The reset helper:
- TRUNCATEs every public table CASCADE, including sources (so tests that
  register their own sources don't leak rows into the next test).
- Re-seeds the default source row that pages.source_id's DEFAULT FKs against.
  Without this, the next page insert would fail FK validation.
- Preserves schema_version so migration helpers don't think the brain is on v0.

Files refactored:
- test/extract-incremental.test.ts (8 tests, was 177s on CI)
- test/brain-writer.test.ts (16 tests; only the scanBrainSources block uses
  PGLite, was 132s on CI)
- test/sync.test.ts (37 tests; only the performSync dry-run block uses PGLite,
  was 87s on CI)

All 61 tests still pass locally. The remaining 20 PGLite-heavy files use the
same beforeEach anti-pattern; this commit only refactors the proven worst
offenders. Sweep the rest in a follow-up if CI numbers indicate it's worth it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: fall back to ubuntu-latest for matrix shard

The ubuntu-latest-16-cores label requires a provisioned larger-runner pool in
repo/org settings. Without that setup, jobs queue indefinitely waiting for a
runner that doesn't exist (verified: 4 shards stuck in 'queued' status with
empty runner_name for 5+ min).

Drop back to the default 2-core ubuntu-latest. The 4-way matrix shard still
delivers ~5-6x speedup via parallelism alone — 4 jobs running in parallel,
each handling ~40 of 158 unit test files. Cost stays $0 (default runner is
free for public repos).

If we ever provision a larger-runner pool, flip this label back to
ubuntu-latest-16-cores. The matrix + sharder will use the bigger boxes
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Wintermute <wintermute@garrytan.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request Apr 30, 2026
…loses #500) (#518)

* feat: structured error code summary for sync --skip-failed (#500)

When sync encounters per-file failures, the blocked/skip-failed messages
now include a breakdown by error code (SLUG_MISMATCH, YAML_PARSE, etc.)
instead of just a raw count. This makes it immediately obvious *why*
files failed without requiring manual investigation.

Changes:
- Add classifyErrorCode() — maps error messages to ParseValidationCode
- Add summarizeFailuresByCode() — groups failures into sorted code summary
- SyncFailure now carries a 'code' field (backfilled on acknowledge)
- acknowledgeSyncFailures() returns AcknowledgeResult {count, summary}
- sync blocked + skip-failed messages show code breakdown
- doctor sync_failures check shows code breakdown for both unacked and historical
- 12 new tests for classifyErrorCode, summarizeFailuresByCode, and structured returns

Before:
  Sync blocked: 2688 file(s) failed to parse.

After:
  Sync blocked: 2688 file(s) failed to parse:
    SLUG_MISMATCH: 2685
    YAML_DUPLICATE_KEY: 3

Closes #500

* test(sync): broaden classifier regexes and pin coverage with 6 new unit tests

Eng review of PR #501 found two ship-blocking gaps in the classifier:

1. Four real production error sites in src/core/import-file.ts emit strings
   that bucketed to UNKNOWN — exactly the silent-systemic-failure pattern
   that motivated #500 in the first place. Add two regex lines:
     FILE_TOO_LARGE       — covers import-file.ts:199, 352, 401
     SYMLINK_NOT_ALLOWED  — covers import-file.ts:347

2. Three existing classifier regexes (MISSING_OPEN, MISSING_CLOSE,
   EMPTY_FRONTMATTER) only matched the literal code-name prefix. The actual
   message strings emitted by markdown.ts:159-244 (e.g. "Frontmatter must
   start with --- on the first non-empty line") wouldn't match. Broaden
   each to match production message text. NESTED_QUOTES already worked.

Add 6 unit tests pinning the contract between markdown.ts/import-file.ts
strings and the classifier regex set. If anyone reworks a validator
message, both sides have to move together — the test fails loudly otherwise.

Test count: 22 → 28 in test/sync-failures.test.ts, all green.

* test(e2e): add failure-loop E2E for sync --skip-failed (issue #500 ship-blocker)

The full code path (record → classify → block → skip → doctor render →
second cycle) had only mocked-JSONL unit coverage. For a hotfix that
changes user-visible CLI output and the doctor surface, that's thin.

One comprehensive E2E test covers the loop:
  1. First sync of clean repo — succeeds, bookmark advances
  2. Add file with bad slug — sync returns 'blocked_by_failures',
     bookmark stays put, JSONL has 1 unacked entry coded SLUG_MISMATCH
  3. --skip-failed — bookmark advances past the bad commit, entry
     transitions to acknowledged, AcknowledgeResult.summary aggregates
  4. Second broken file (different path, same code) — sync blocks again,
     1 acked + 1 unacked, dedup honors path identity
  5. --skip-failed again — both acked, summary correctly counts 2

Hermetic on a developer machine: saves ~/.gbrain/sync-failures.jsonl
before the test, restores it after. Doctor rendering verified by calling
the same primitives doctor.ts uses (loadSyncFailures + summarizeFailuresByCode)
rather than runDoctor() — runDoctor is a CLI entrypoint with stdout/exit
side effects that truncate the test mid-flow.

E2E count: 13 → 14 in test/e2e/sync.test.ts. All 14 pass under real
Postgres + pgvector (gbrain-test-pg/pgvector:pg16).

* v0.22.12: structured error code summary for sync --skip-failed

Closes issue #500. PR #501 by @WinterMute is the foundation (cherry-picked
as c356ea4 — classifier, doctor breakdown, AcknowledgeResult shape, 12 unit
tests). This release adds:

- Classifier coverage for FILE_TOO_LARGE + SYMLINK_NOT_ALLOWED (the four
  size/symlink rejection sites in import-file.ts that bucketed to UNKNOWN).
- Three regex breadths (MISSING_OPEN, MISSING_CLOSE, EMPTY_FRONTMATTER)
  matching actual markdown.ts validator messages, not just the literal
  code-name prefix.
- 6 new unit tests pinning literal production strings.
- 1 comprehensive E2E test exercising the full failure loop.

Total v0.22.12 diff: ~340 lines on top of PR #501. Backward-compatible —
pre-v0.22.12 JSONL entries get classified at acknowledge time.

* chore: regenerate llms-full.txt for v0.22.12 CLAUDE.md changes

CI regen-drift guard caught that llms-full.txt was stale after the v0.22.12
CLAUDE.md annotation updates (sync.ts, doctor.ts, sync-failures.test.ts,
e2e/sync.test.ts entries). Per CLAUDE.md "Auto-derived" rule: run
`bun run build:llms` after any release ship that touches Key Files
annotations. The bundle reflects current docs state.

llms.txt unchanged (curated index doesn't index those entries).
llms-full.txt: 308192 bytes.

test/build-llms.test.ts now passes 7/7 (was 6/7 in CI).

---------

Co-authored-by: Wintermute <wintermute@garrytan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sync --skip-failed should emit structured summary by error code and persist skip manifest

1 participant