v0.22.9 feat: structured error code summary for sync --skip-failed#501
Merged
Conversation
When sync encounters per-file failures, the blocked/skip-failed messages
now include a breakdown by error code (SLUG_MISMATCH, YAML_PARSE, etc.)
instead of just a raw count. This makes it immediately obvious *why*
files failed without requiring manual investigation.
Changes:
- Add classifyErrorCode() — maps error messages to ParseValidationCode
- Add summarizeFailuresByCode() — groups failures into sorted code summary
- SyncFailure now carries a 'code' field (backfilled on acknowledge)
- acknowledgeSyncFailures() returns AcknowledgeResult {count, summary}
- sync blocked + skip-failed messages show code breakdown
- doctor sync_failures check shows code breakdown for both unacked and historical
- 12 new tests for classifyErrorCode, summarizeFailuresByCode, and structured returns
Before:
Sync blocked: 2688 file(s) failed to parse.
After:
Sync blocked: 2688 file(s) failed to parse:
SLUG_MISMATCH: 2685
YAML_DUPLICATE_KEY: 3
Closes #500
13 tasks
- Reorder classifyErrorCode() so DB-layer errors (DB_DUPLICATE_KEY,
STATEMENT_TIMEOUT) check BEFORE YAML patterns. Postgres "duplicate key
value violates unique constraint" no longer mislabels as YAML_DUPLICATE_KEY.
- Rewrite MISSING_OPEN/MISSING_CLOSE/EMPTY_FRONTMATTER/NULL_BYTES/NESTED_QUOTES
regexes to match the canonical messages emitted by collectValidationErrors()
in src/core/markdown.ts. Previous patterns (e.g. /missing.*open/i) never
fired because the upstream throw site emits prose ("File is empty...",
"No closing --- delimiter found"), not the code name.
- Extract formatCodeBreakdown() helper that accepts either raw failures or
pre-summarized {code, count}[] input. Replaces 3 duplicate inline builders
in src/commands/sync.ts.
- 15 new tests (37/37 pass on test/sync-failures.test.ts):
- DB vs YAML duplicate-key disambiguation (3 cases)
- Canonical-message coverage for the 5 frontmatter codes (7 cases)
- acknowledgeSyncFailures() legacy-entry backfill branch (2 cases)
- formatCodeBreakdown() dual-input shape (3 cases)
- TODOS.md: file 3 follow-ups (P2 plumb structured ParseValidationCode;
P0-at-ship CHANGELOG migration note for AcknowledgeResult; P3 concurrent-
safe ack of sync-failures.jsonl).
Eng-review plan: ~/.claude/plans/then-codex-synchronous-toucan.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unit test suite ran 22m17s on ubuntu-latest (2-core/7GB) because: - 187 test files run with bun test parallelism bounded by core count - 23 of those files spin up a fresh PGLiteEngine + initSchema in beforeEach, paying ~22s WASM cold-start per test on the small runner This commit fixes the runner side: - runs-on: ubuntu-latest-16-cores (16 vCPU / 64 GB RAM) - strategy.matrix.shard splits 4 parallel jobs, each running ~40 of 158 unit test files. Single-file wall-time floor is ~3 min after the test refactor, so 4 shards × 16 cores hits the floor quickly without wasting cores past it. - pre-test gates (typecheck, check-jsonb, check-progress, check-wasm) only run on shard 1 — they're not test files and don't benefit from sharding. scripts/test-shard.sh partitions test files by stable FNV-1a hash mod N. Same file always lands in the same shard, so retries are reproducible. Pure shell, portable to bash 3.2 (macOS) and bash 5.x (CI). Excludes test/e2e/ which runs via bun run test:e2e separately and needs DATABASE_URL. Also: ignore .claude/ harness state files (scheduled_tasks.lock etc) instead of just .claude/skills/. Cost: ~$0.19/run vs $0 (public repo, default runner is free). At 50 PRs/month that's ~$10/month for ~5x faster CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three test files were spinning up a fresh PGLiteEngine + connect + initSchema in beforeEach. PGLite WASM cold-start is ~22s on the small CI runner; doing this per test multiplied wall-time across the suite. The 3 files alone accounted for ~6.5 min of the 22m CI run (177s + 132s + 87s). Refactor: move PGLite setup to beforeAll (one engine per file), wipe data in beforeEach via the new test/helpers/reset-pglite.ts helper. The reset helper: - TRUNCATEs every public table CASCADE, including sources (so tests that register their own sources don't leak rows into the next test). - Re-seeds the default source row that pages.source_id's DEFAULT FKs against. Without this, the next page insert would fail FK validation. - Preserves schema_version so migration helpers don't think the brain is on v0. Files refactored: - test/extract-incremental.test.ts (8 tests, was 177s on CI) - test/brain-writer.test.ts (16 tests; only the scanBrainSources block uses PGLite, was 132s on CI) - test/sync.test.ts (37 tests; only the performSync dry-run block uses PGLite, was 87s on CI) All 61 tests still pass locally. The remaining 20 PGLite-heavy files use the same beforeEach anti-pattern; this commit only refactors the proven worst offenders. Sweep the rest in a follow-up if CI numbers indicate it's worth it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ubuntu-latest-16-cores label requires a provisioned larger-runner pool in repo/org settings. Without that setup, jobs queue indefinitely waiting for a runner that doesn't exist (verified: 4 shards stuck in 'queued' status with empty runner_name for 5+ min). Drop back to the default 2-core ubuntu-latest. The 4-way matrix shard still delivers ~5-6x speedup via parallelism alone — 4 jobs running in parallel, each handling ~40 of 158 unit test files. Cost stays $0 (default runner is free for public repos). If we ever provision a larger-runner pool, flip this label back to ubuntu-latest-16-cores. The matrix + sharder will use the bigger boxes unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu
pushed a commit
to ChenyqThu/jarvis-knowledge-os-v2
that referenced
this pull request
Apr 30, 2026
…arrytan#501) * feat: structured error code summary for sync --skip-failed (garrytan#500) When sync encounters per-file failures, the blocked/skip-failed messages now include a breakdown by error code (SLUG_MISMATCH, YAML_PARSE, etc.) instead of just a raw count. This makes it immediately obvious *why* files failed without requiring manual investigation. Changes: - Add classifyErrorCode() — maps error messages to ParseValidationCode - Add summarizeFailuresByCode() — groups failures into sorted code summary - SyncFailure now carries a 'code' field (backfilled on acknowledge) - acknowledgeSyncFailures() returns AcknowledgeResult {count, summary} - sync blocked + skip-failed messages show code breakdown - doctor sync_failures check shows code breakdown for both unacked and historical - 12 new tests for classifyErrorCode, summarizeFailuresByCode, and structured returns Before: Sync blocked: 2688 file(s) failed to parse. After: Sync blocked: 2688 file(s) failed to parse: SLUG_MISMATCH: 2685 YAML_DUPLICATE_KEY: 3 Closes garrytan#500 * fix: eng-review fixes for sync error-code classification - Reorder classifyErrorCode() so DB-layer errors (DB_DUPLICATE_KEY, STATEMENT_TIMEOUT) check BEFORE YAML patterns. Postgres "duplicate key value violates unique constraint" no longer mislabels as YAML_DUPLICATE_KEY. - Rewrite MISSING_OPEN/MISSING_CLOSE/EMPTY_FRONTMATTER/NULL_BYTES/NESTED_QUOTES regexes to match the canonical messages emitted by collectValidationErrors() in src/core/markdown.ts. Previous patterns (e.g. /missing.*open/i) never fired because the upstream throw site emits prose ("File is empty...", "No closing --- delimiter found"), not the code name. - Extract formatCodeBreakdown() helper that accepts either raw failures or pre-summarized {code, count}[] input. Replaces 3 duplicate inline builders in src/commands/sync.ts. - 15 new tests (37/37 pass on test/sync-failures.test.ts): - DB vs YAML duplicate-key disambiguation (3 cases) - Canonical-message coverage for the 5 frontmatter codes (7 cases) - acknowledgeSyncFailures() legacy-entry backfill branch (2 cases) - formatCodeBreakdown() dual-input shape (3 cases) - TODOS.md: file 3 follow-ups (P2 plumb structured ParseValidationCode; P0-at-ship CHANGELOG migration note for AcknowledgeResult; P3 concurrent- safe ack of sync-failures.jsonl). Eng-review plan: ~/.claude/plans/then-codex-synchronous-toucan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.22.9) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: 16-core runner + 4-way matrix shard for test job The unit test suite ran 22m17s on ubuntu-latest (2-core/7GB) because: - 187 test files run with bun test parallelism bounded by core count - 23 of those files spin up a fresh PGLiteEngine + initSchema in beforeEach, paying ~22s WASM cold-start per test on the small runner This commit fixes the runner side: - runs-on: ubuntu-latest-16-cores (16 vCPU / 64 GB RAM) - strategy.matrix.shard splits 4 parallel jobs, each running ~40 of 158 unit test files. Single-file wall-time floor is ~3 min after the test refactor, so 4 shards × 16 cores hits the floor quickly without wasting cores past it. - pre-test gates (typecheck, check-jsonb, check-progress, check-wasm) only run on shard 1 — they're not test files and don't benefit from sharding. scripts/test-shard.sh partitions test files by stable FNV-1a hash mod N. Same file always lands in the same shard, so retries are reproducible. Pure shell, portable to bash 3.2 (macOS) and bash 5.x (CI). Excludes test/e2e/ which runs via bun run test:e2e separately and needs DATABASE_URL. Also: ignore .claude/ harness state files (scheduled_tasks.lock etc) instead of just .claude/skills/. Cost: ~$0.19/run vs $0 (public repo, default runner is free). At 50 PRs/month that's ~$10/month for ~5x faster CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: refactor top-3 PGLite-heavy files to share one engine per file Three test files were spinning up a fresh PGLiteEngine + connect + initSchema in beforeEach. PGLite WASM cold-start is ~22s on the small CI runner; doing this per test multiplied wall-time across the suite. The 3 files alone accounted for ~6.5 min of the 22m CI run (177s + 132s + 87s). Refactor: move PGLite setup to beforeAll (one engine per file), wipe data in beforeEach via the new test/helpers/reset-pglite.ts helper. The reset helper: - TRUNCATEs every public table CASCADE, including sources (so tests that register their own sources don't leak rows into the next test). - Re-seeds the default source row that pages.source_id's DEFAULT FKs against. Without this, the next page insert would fail FK validation. - Preserves schema_version so migration helpers don't think the brain is on v0. Files refactored: - test/extract-incremental.test.ts (8 tests, was 177s on CI) - test/brain-writer.test.ts (16 tests; only the scanBrainSources block uses PGLite, was 132s on CI) - test/sync.test.ts (37 tests; only the performSync dry-run block uses PGLite, was 87s on CI) All 61 tests still pass locally. The remaining 20 PGLite-heavy files use the same beforeEach anti-pattern; this commit only refactors the proven worst offenders. Sweep the rest in a follow-up if CI numbers indicate it's worth it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: fall back to ubuntu-latest for matrix shard The ubuntu-latest-16-cores label requires a provisioned larger-runner pool in repo/org settings. Without that setup, jobs queue indefinitely waiting for a runner that doesn't exist (verified: 4 shards stuck in 'queued' status with empty runner_name for 5+ min). Drop back to the default 2-core ubuntu-latest. The 4-way matrix shard still delivers ~5-6x speedup via parallelism alone — 4 jobs running in parallel, each handling ~40 of 158 unit test files. Cost stays $0 (default runner is free for public repos). If we ever provision a larger-runner pool, flip this label back to ubuntu-latest-16-cores. The matrix + sharder will use the bigger boxes unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Wintermute <wintermute@garrytan.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
Apr 30, 2026
…loses #500) (#518) * feat: structured error code summary for sync --skip-failed (#500) When sync encounters per-file failures, the blocked/skip-failed messages now include a breakdown by error code (SLUG_MISMATCH, YAML_PARSE, etc.) instead of just a raw count. This makes it immediately obvious *why* files failed without requiring manual investigation. Changes: - Add classifyErrorCode() — maps error messages to ParseValidationCode - Add summarizeFailuresByCode() — groups failures into sorted code summary - SyncFailure now carries a 'code' field (backfilled on acknowledge) - acknowledgeSyncFailures() returns AcknowledgeResult {count, summary} - sync blocked + skip-failed messages show code breakdown - doctor sync_failures check shows code breakdown for both unacked and historical - 12 new tests for classifyErrorCode, summarizeFailuresByCode, and structured returns Before: Sync blocked: 2688 file(s) failed to parse. After: Sync blocked: 2688 file(s) failed to parse: SLUG_MISMATCH: 2685 YAML_DUPLICATE_KEY: 3 Closes #500 * test(sync): broaden classifier regexes and pin coverage with 6 new unit tests Eng review of PR #501 found two ship-blocking gaps in the classifier: 1. Four real production error sites in src/core/import-file.ts emit strings that bucketed to UNKNOWN — exactly the silent-systemic-failure pattern that motivated #500 in the first place. Add two regex lines: FILE_TOO_LARGE — covers import-file.ts:199, 352, 401 SYMLINK_NOT_ALLOWED — covers import-file.ts:347 2. Three existing classifier regexes (MISSING_OPEN, MISSING_CLOSE, EMPTY_FRONTMATTER) only matched the literal code-name prefix. The actual message strings emitted by markdown.ts:159-244 (e.g. "Frontmatter must start with --- on the first non-empty line") wouldn't match. Broaden each to match production message text. NESTED_QUOTES already worked. Add 6 unit tests pinning the contract between markdown.ts/import-file.ts strings and the classifier regex set. If anyone reworks a validator message, both sides have to move together — the test fails loudly otherwise. Test count: 22 → 28 in test/sync-failures.test.ts, all green. * test(e2e): add failure-loop E2E for sync --skip-failed (issue #500 ship-blocker) The full code path (record → classify → block → skip → doctor render → second cycle) had only mocked-JSONL unit coverage. For a hotfix that changes user-visible CLI output and the doctor surface, that's thin. One comprehensive E2E test covers the loop: 1. First sync of clean repo — succeeds, bookmark advances 2. Add file with bad slug — sync returns 'blocked_by_failures', bookmark stays put, JSONL has 1 unacked entry coded SLUG_MISMATCH 3. --skip-failed — bookmark advances past the bad commit, entry transitions to acknowledged, AcknowledgeResult.summary aggregates 4. Second broken file (different path, same code) — sync blocks again, 1 acked + 1 unacked, dedup honors path identity 5. --skip-failed again — both acked, summary correctly counts 2 Hermetic on a developer machine: saves ~/.gbrain/sync-failures.jsonl before the test, restores it after. Doctor rendering verified by calling the same primitives doctor.ts uses (loadSyncFailures + summarizeFailuresByCode) rather than runDoctor() — runDoctor is a CLI entrypoint with stdout/exit side effects that truncate the test mid-flow. E2E count: 13 → 14 in test/e2e/sync.test.ts. All 14 pass under real Postgres + pgvector (gbrain-test-pg/pgvector:pg16). * v0.22.12: structured error code summary for sync --skip-failed Closes issue #500. PR #501 by @WinterMute is the foundation (cherry-picked as c356ea4 — classifier, doctor breakdown, AcknowledgeResult shape, 12 unit tests). This release adds: - Classifier coverage for FILE_TOO_LARGE + SYMLINK_NOT_ALLOWED (the four size/symlink rejection sites in import-file.ts that bucketed to UNKNOWN). - Three regex breadths (MISSING_OPEN, MISSING_CLOSE, EMPTY_FRONTMATTER) matching actual markdown.ts validator messages, not just the literal code-name prefix. - 6 new unit tests pinning literal production strings. - 1 comprehensive E2E test exercising the full failure loop. Total v0.22.12 diff: ~340 lines on top of PR #501. Backward-compatible — pre-v0.22.12 JSONL entries get classified at acknowledge time. * chore: regenerate llms-full.txt for v0.22.12 CLAUDE.md changes CI regen-drift guard caught that llms-full.txt was stale after the v0.22.12 CLAUDE.md annotation updates (sync.ts, doctor.ts, sync-failures.test.ts, e2e/sync.test.ts entries). Per CLAUDE.md "Auto-derived" rule: run `bun run build:llms` after any release ship that touches Key Files annotations. The bundle reflects current docs state. llms.txt unchanged (curated index doesn't index those entries). llms-full.txt: 308192 bytes. test/build-llms.test.ts now passes 7/7 (was 6/7 in CI). --------- Co-authored-by: Wintermute <wintermute@garrytan.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sync failures now tell you why, not just how many.
gbrain sync --skip-failedandgbrain doctorgroup failures by error code instead of just printing a count.Before:
Sync blocked: 2688 file(s) failed to parse.After:
Sync blocked: 2688 file(s) failed to parse:\n SLUG_MISMATCH: 2685\n YAML_DUPLICATE_KEY: 3On a real 81K-page brain, 2,685 files silently failed with
SLUG_MISMATCHfrom a posterous import. The breakdown turns "panic" into "fix one root cause."This PR ships the original feature (commit 4b5d29d) PLUS eng-review fixes (commit 52a2678) addressing 7 findings from
/plan-eng-review+/codexoutside-voice review.Eng-review fixes (in this PR, post-original-commit)
/duplicate.*key/ino longer mislabels Postgresduplicate key value violates unique constraintasYAML_DUPLICATE_KEY. NewDB_DUPLICATE_KEYcode, checked before YAML patterns.MISSING_OPEN/MISSING_CLOSE/EMPTY_FRONTMATTERregexes rewritten to match the canonical messages emitted bycollectValidationErrors()insrc/core/markdown.ts. The previous patterns (missing.*open,missing.*close) never fired in practice — the upstream throw site emits prose, not code names.acknowledgeSyncFailures()shape change is a deliberate breaking change — CHANGELOG carries a### For contributorsmigration block.formatCodeBreakdown()helper extracted; collapses 3 inline builders insrc/commands/sync.ts.acknowledgeSyncFailures()now has dedicated coverage.Test Coverage
37/37 tests pass on
test/sync-failures.test.ts. Fullbun testshows same baseline as master pre-PR (133 fails / 99 errors are pre-existing DB-required E2E tests — none touch this PR's code).Tests added: 22 → 37 (+15 net).
Pre-Landing Review
/plan-eng-reviewran with/codexoutside-voice. 7 findings, 7 addressed in this PR:/duplicate.*key/iPostgres false-positiveAcknowledgeResultis breaking changePlan file:
~/.claude/plans/then-codex-synchronous-toucan.md. No cross-model tension — both reviewers identified the same issues with the same severity ordering.TODOS
3 follow-ups filed in
TODOS.mdunder new## sync error-code classification (PR #501 follow-ups)section:ParseValidationCodethroughImportResult(the architectural fix that deletes 50% ofclassifyErrorCode).### For contributorsmigration note for theacknowledgeSyncFailures()shape change. Done in this PR's CHANGELOG.~/.gbrain/sync-failures.jsonl(pre-existing).Documentation
CHANGELOG entry added with full release-summary block (lead, "what this means,"
### For contributors, itemized changes). VERSION + package.json bumped to v0.22.9. CLAUDE.md key-files annotations not updated — the new helpers (classifyErrorCode,summarizeFailuresByCode,formatCodeBreakdown) are well-localized insrc/core/sync.ts; recommend running/document-releasepost-merge for a comprehensive sweep.Notes for the maintainer
Wintermute <wintermute@garrytan.com>(original commit) andClaude Opus 4.7 (1M context)(eng-review fixes commit).Test plan
bun test test/sync-failures.test.ts— 37/37 passbun test— same baseline as master (no regressions from this PR)duplicate key value violates unique constraintdoes NOT classify as YAML_DUPLICATE_KEY in production logs🤖 Generated with Claude Code