fix(sync): --skip-failed acks pre-existing unacked failures up-front by brandonlipman · Pull Request #686 · garrytan/gbrain

brandonlipman · 2026-05-06T17:35:51Z

Bug

The recovery flow that gbrain doctor and printSyncResult both advertise is broken end-to-end:

Files with bad YAML hit the failure log; sync stays blocked at last_commit.
User fixes the YAML.
User re-runs gbrain sync → succeeds, advances last_commit.
gbrain doctor still reports N unacked failures from step 1 because ~/.gbrain/sync-failures.jsonl is append-only history, never auto-cleared.
doctor message: "use gbrain sync --skip-failed to acknowledge."
User runs gbrain sync --skip-failed → Already up to date. → log unchanged.

The flag is a no-op exactly when the user needs it.

Root cause

--skip-failed only acknowledges failures from the current run. The ack path lives inside performSync and is gated on failedFiles.length > 0 (src/commands/sync.ts:695). When the diff is empty (because the user already fixed the bad files) or the sync is up to date, that branch never executes, and the flag never reaches acknowledgeSyncFailures(). The doctor and printSyncResult:1251 messages assume the flag handles stale entries — it doesn't.

Repro

$ gbrain doctor
  [WARN] sync_failures: 27 unacknowledged sync failure(s)...
         Fix the file(s) and re-run 'gbrain sync', or use
         'gbrain sync --skip-failed' to acknowledge.
$ # ... fix the YAML ...
$ gbrain sync
Already up to date.
$ gbrain sync --skip-failed
Already up to date.        # ← log untouched
$ gbrain doctor
  [WARN] sync_failures: 27 unacknowledged sync failure(s)...   # still!

Hit while debugging a real brain that had 27 unacked failures from 3 files (2 already-fixed, 1 fresh-broken).

Fix

At the top of runSync, when --skip-failed is set, eagerly ack any pre-existing unacked failures before any sync work runs. The flag now means "acknowledge whatever is currently flagged and move on" regardless of whether the current run produces new failures or finds nothing to do.

The inner per-run ack path stays — it still handles new failures from the current run. The two paths compose: gbrain sync --skip-failed clears stale + advances past anything new, all in one command, matching what the doctor message promises.

After

$ gbrain sync --skip-failed
Acknowledged 27 pre-existing failure(s).
Already up to date.
$ gbrain doctor
  [OK] sync_failures: N historical sync failure(s), all acknowledged

Tests

Two added in test/sync-failures.test.ts:

Source-string pin on the new gate (matches the file's existing CLI-flag test convention).
Behavioral test on the underlying acknowledgeSyncFailures path.

$ bun test test/sync-failures.test.ts
41 pass, 0 fail, 75 expect() calls

Behavior preserved

Without --skip-failed, the existing block-sync-on-failures behavior is unchanged.
Within a single sync run that produces new failures + is invoked with --skip-failed, the inner ack path still handles them (the new code only acks pre-existing entries; the inner path acks the run's own).
--retry-failed semantics are unchanged.

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

…otstrap The forward-reference bootstrap (PostgresEngine + PGLiteEngine applyForwardReferenceBootstrap) covered v0.18 + v0.19 + v0.26.5 columns but missed two later groups. Brains upgrading from v0.14-era to current master crash before the migration ladder runs: 1. v0.20 Cathedral II — content_chunks.search_vector, parent_symbol_path, doc_comment, symbol_name_qualified. `CREATE INDEX idx_chunks_search_vector` and `CREATE INDEX idx_chunks_symbol_qualified` in schema.sql/PGLITE_SCHEMA_SQL crash with "column search_vector does not exist" / "column symbol_name_qualified does not exist". 2. v0.26.3 — mcp_request_log.agent_name, params, error_message. `CREATE INDEX idx_mcp_log_agent_time ON mcp_request_log(agent_name,...)` crashes with "column agent_name does not exist". Reproduces deterministically on a v0.13/v0.14 brain upgraded straight to current master. The user hits the wall before any of v15-v36 can run. Both engines now probe for these columns and pre-add them via `ALTER TABLE ADD COLUMN IF NOT EXISTS` before SCHEMA_SQL runs. Migrations v26, v27, v33 still run later via runMigrations and remain idempotent (they handle backfill on top of the bootstrap-added columns). Test coverage extended in test/schema-bootstrap-coverage.test.ts: REQUIRED_BOOTSTRAP_COVERAGE now lists 6 new forward references; the strip-and-rebuild block drops the corresponding indexes/triggers so the test exercises a brain that pre-dates v0.20 + v0.26.3 migrations. Repro: brain on schema v13/v14 + run `gbrain init --migrate-only` against current master → fails. With this patch → succeeds; ladder runs to v36.

`package.json` declares `"bin": { "gbrain": "src/cli.ts" }`, and bun's linker creates `~/.bun/bin/gbrain` as a symlink to the file. The shebang `#!/usr/bin/env bun` works only when the target file is executable — otherwise bun runs it as a script (because it sees the script via the shebang interpreter), but executing the symlinked target itself fails: $ ls -la ~/.bun/bin/gbrain lrwxrwxrwx ... -> ../install/global/node_modules/gbrain/src/cli.ts $ ~/.bun/bin/gbrain --version /opt/homebrew/bin/bash: line 1: /Users/brandon/.bun/bin/gbrain: Permission denied This bites the postinstall hook that calls `gbrain apply-migrations` (masked by the `||` fallback) and any subprocess that invokes the binary by absolute path (e.g., subagent_messages migration v0.16's `execSync('gbrain init --migrate-only', ...)`). Setting the mode in-tree to 755 fixes both. No content change.

…able `gbrain doctor` was the only consumer of `findRepoRoot` from `core/repo-root.ts`. Every other consumer (check-resolvable.ts:145, skillify.ts, etc.) uses `autoDetectSkillsDir`, which has the full detection chain: 1. \$OPENCLAW_WORKSPACE 2. ~/.openclaw/workspace 3. findRepoRoot() walk from cwd 4. ./skills `findRepoRoot` only does step 3. Result: when the user runs `gbrain doctor` from any directory outside the gbrain repo or the OpenClaw workspace tree (e.g., a project's checkout), `resolver_health` reports "Could not find skills directory" even though the dispatcher exists at ~/.openclaw/workspace/skills/RESOLVER.md. Reproduces in any directory other than ~/gbrain or its descendants on a system with ~/.openclaw/workspace/skills/RESOLVER.md present: \$ cd ~/Documents \$ gbrain doctor [WARN] resolver_health: Could not find skills directory # before [WARN] resolver_health: 5 issue(s): 0 error(s), 5 warning(s) # after Switching doctor to `autoDetectSkillsDir` brings it inline with the rest of the codebase. The detected dir is also passed to `checkSkillConformance` (step 2 of the resolver_health block), which previously rebuilt the path from `repoRoot` — now uses the same detected path for consistency. All 15 existing tests in test/doctor.test.ts continue to pass.

The recovery flow that doctor + printSyncResult both advertise was broken: 1. User has files with bad YAML → they hit the failure log + sync stays blocked at last_commit. 2. User fixes the YAML. 3. User re-runs `gbrain sync` — sync succeeds, advances last_commit. 4. `gbrain doctor` still reports N unacked failures from step 1 because sync-failures.jsonl is append-only history, never auto-cleared. 5. doctor message says: "use 'gbrain sync --skip-failed' to acknowledge". 6. User runs `gbrain sync --skip-failed` → "Already up to date." → log unchanged. The bug: --skip-failed only acknowledges failures from the CURRENT run. performSync's ack path is gated on `failedFiles.length > 0` after sync — it never fires when the diff is empty (because the user already fixed the bad files) or when the sync is up to date. So the documented recovery sequence is a no-op exactly when the user needs it. The fix: at the top of runSync, when --skip-failed is set, eagerly ack any pre-existing unacked failures before any sync work runs. Now the flag means "acknowledge whatever is currently flagged and move on" regardless of whether the current run produces new failures or finds nothing to do. The inner per-run ack path stays — it still handles new failures from the CURRENT run, which is the (a) syncing now produces failures + (b) caller wants to ack them path. The two paths compose: `gbrain sync --skip-failed` clears stale + advances past anything new, all in one command, matching what the doctor message promises. Tests: 2 added in test/sync-failures.test.ts. One source-string pin on the new gate (the file's existing pattern for CLI-flag tests). One behavioral test on the underlying acknowledgeSyncFailures path. Repro: $ gbrain doctor [WARN] sync_failures: 27 unacknowledged sync failure(s)... Fix the file(s) and re-run 'gbrain sync', or use 'gbrain sync --skip-failed' to acknowledge. $ # ... fix the YAML ... $ gbrain sync Already up to date. $ gbrain sync --skip-failed Already up to date. # before this PR $ gbrain doctor [WARN] sync_failures: 27 unacknowledged sync failure(s)... # still! After: $ gbrain sync --skip-failed Acknowledged 27 pre-existing failure(s). Already up to date. $ gbrain doctor [OK] sync_failures: N historical sync failure(s), all acknowledged

garrytan · 2026-05-10T03:50:59Z

Closing — your fix landed in master via the v0.30.3 fix-wave PR #776 (merged at ff53a4c9): "--skip-failed acks pre-existing unacked failures up-front".

Thank you for the contribution — credit is preserved in the v0.30.3 CHANGELOG entry. 🙏

brandonlipman added 6 commits May 6, 2026 12:39

Skillify: Add linkedin-enrichment skill and chain to enrich

7d49f23

Update linkedin-enrichment to support raw extraction file creation

f558b26

garrytan mentioned this pull request May 9, 2026

v0.31.1.1-fixwave fix-wave: 22 community fixes (auth-code P0, upgrade-path, sync, multi-source, privacy) #776

Merged

5 tasks

garrytan closed this May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sync): --skip-failed acks pre-existing unacked failures up-front#686

fix(sync): --skip-failed acks pre-existing unacked failures up-front#686
brandonlipman wants to merge 6 commits intogarrytan:masterfrom
brandonlipman:fix/sync-skip-failed-acks-stale

brandonlipman commented May 6, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

garrytan commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brandonlipman commented May 6, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug

Root cause

Repro

Fix

After

Tests

Behavior preserved

Uh oh!

garrytan commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brandonlipman commented May 6, 2026 •

edited by blacksmith-sh Bot

Loading