fix(sync): --skip-failed acks pre-existing unacked failures up-front#686
Closed
brandonlipman wants to merge 6 commits intogarrytan:masterfrom
Closed
fix(sync): --skip-failed acks pre-existing unacked failures up-front#686brandonlipman wants to merge 6 commits intogarrytan:masterfrom
brandonlipman wants to merge 6 commits intogarrytan:masterfrom
Conversation
…otstrap The forward-reference bootstrap (PostgresEngine + PGLiteEngine applyForwardReferenceBootstrap) covered v0.18 + v0.19 + v0.26.5 columns but missed two later groups. Brains upgrading from v0.14-era to current master crash before the migration ladder runs: 1. v0.20 Cathedral II — content_chunks.search_vector, parent_symbol_path, doc_comment, symbol_name_qualified. `CREATE INDEX idx_chunks_search_vector` and `CREATE INDEX idx_chunks_symbol_qualified` in schema.sql/PGLITE_SCHEMA_SQL crash with "column search_vector does not exist" / "column symbol_name_qualified does not exist". 2. v0.26.3 — mcp_request_log.agent_name, params, error_message. `CREATE INDEX idx_mcp_log_agent_time ON mcp_request_log(agent_name,...)` crashes with "column agent_name does not exist". Reproduces deterministically on a v0.13/v0.14 brain upgraded straight to current master. The user hits the wall before any of v15-v36 can run. Both engines now probe for these columns and pre-add them via `ALTER TABLE ADD COLUMN IF NOT EXISTS` before SCHEMA_SQL runs. Migrations v26, v27, v33 still run later via runMigrations and remain idempotent (they handle backfill on top of the bootstrap-added columns). Test coverage extended in test/schema-bootstrap-coverage.test.ts: REQUIRED_BOOTSTRAP_COVERAGE now lists 6 new forward references; the strip-and-rebuild block drops the corresponding indexes/triggers so the test exercises a brain that pre-dates v0.20 + v0.26.3 migrations. Repro: brain on schema v13/v14 + run `gbrain init --migrate-only` against current master → fails. With this patch → succeeds; ladder runs to v36.
`package.json` declares `"bin": { "gbrain": "src/cli.ts" }`, and bun's
linker creates `~/.bun/bin/gbrain` as a symlink to the file. The shebang
`#!/usr/bin/env bun` works only when the target file is executable —
otherwise bun runs it as a script (because it sees the script via the
shebang interpreter), but executing the symlinked target itself fails:
$ ls -la ~/.bun/bin/gbrain
lrwxrwxrwx ... -> ../install/global/node_modules/gbrain/src/cli.ts
$ ~/.bun/bin/gbrain --version
/opt/homebrew/bin/bash: line 1: /Users/brandon/.bun/bin/gbrain: Permission denied
This bites the postinstall hook that calls `gbrain apply-migrations`
(masked by the `||` fallback) and any subprocess that invokes the
binary by absolute path (e.g., subagent_messages migration v0.16's
`execSync('gbrain init --migrate-only', ...)`).
Setting the mode in-tree to 755 fixes both. No content change.
…able
`gbrain doctor` was the only consumer of `findRepoRoot` from
`core/repo-root.ts`. Every other consumer (check-resolvable.ts:145,
skillify.ts, etc.) uses `autoDetectSkillsDir`, which has the full
detection chain:
1. \$OPENCLAW_WORKSPACE
2. ~/.openclaw/workspace
3. findRepoRoot() walk from cwd
4. ./skills
`findRepoRoot` only does step 3. Result: when the user runs `gbrain
doctor` from any directory outside the gbrain repo or the OpenClaw
workspace tree (e.g., a project's checkout), `resolver_health` reports
"Could not find skills directory" even though the dispatcher exists at
~/.openclaw/workspace/skills/RESOLVER.md.
Reproduces in any directory other than ~/gbrain or its descendants on
a system with ~/.openclaw/workspace/skills/RESOLVER.md present:
\$ cd ~/Documents
\$ gbrain doctor
[WARN] resolver_health: Could not find skills directory # before
[WARN] resolver_health: 5 issue(s): 0 error(s), 5 warning(s) # after
Switching doctor to `autoDetectSkillsDir` brings it inline with the rest
of the codebase. The detected dir is also passed to
`checkSkillConformance` (step 2 of the resolver_health block), which
previously rebuilt the path from `repoRoot` — now uses the same
detected path for consistency.
All 15 existing tests in test/doctor.test.ts continue to pass.
The recovery flow that doctor + printSyncResult both advertise was broken:
1. User has files with bad YAML → they hit the failure log + sync stays
blocked at last_commit.
2. User fixes the YAML.
3. User re-runs `gbrain sync` — sync succeeds, advances last_commit.
4. `gbrain doctor` still reports N unacked failures from step 1 because
sync-failures.jsonl is append-only history, never auto-cleared.
5. doctor message says: "use 'gbrain sync --skip-failed' to acknowledge".
6. User runs `gbrain sync --skip-failed` → "Already up to date." → log
unchanged.
The bug: --skip-failed only acknowledges failures from the CURRENT run.
performSync's ack path is gated on `failedFiles.length > 0` after sync —
it never fires when the diff is empty (because the user already fixed
the bad files) or when the sync is up to date. So the documented recovery
sequence is a no-op exactly when the user needs it.
The fix: at the top of runSync, when --skip-failed is set, eagerly ack
any pre-existing unacked failures before any sync work runs. Now the flag
means "acknowledge whatever is currently flagged and move on" regardless
of whether the current run produces new failures or finds nothing to do.
The inner per-run ack path stays — it still handles new failures from
the CURRENT run, which is the (a) syncing now produces failures + (b)
caller wants to ack them path. The two paths compose: `gbrain sync
--skip-failed` clears stale + advances past anything new, all in one
command, matching what the doctor message promises.
Tests: 2 added in test/sync-failures.test.ts. One source-string pin on
the new gate (the file's existing pattern for CLI-flag tests). One
behavioral test on the underlying acknowledgeSyncFailures path.
Repro:
$ gbrain doctor
[WARN] sync_failures: 27 unacknowledged sync failure(s)...
Fix the file(s) and re-run 'gbrain sync', or use
'gbrain sync --skip-failed' to acknowledge.
$ # ... fix the YAML ...
$ gbrain sync
Already up to date.
$ gbrain sync --skip-failed
Already up to date. # before this PR
$ gbrain doctor
[WARN] sync_failures: 27 unacknowledged sync failure(s)... # still!
After:
$ gbrain sync --skip-failed
Acknowledged 27 pre-existing failure(s).
Already up to date.
$ gbrain doctor
[OK] sync_failures: N historical sync failure(s), all acknowledged
5 tasks
Owner
|
Closing — your fix landed in master via the v0.30.3 fix-wave PR #776 (merged at Thank you for the contribution — credit is preserved in the v0.30.3 CHANGELOG entry. 🙏 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
The recovery flow that
gbrain doctorandprintSyncResultboth advertise is broken end-to-end:last_commit.gbrain sync→ succeeds, advanceslast_commit.gbrain doctorstill reportsN unacked failuresfrom step 1 because~/.gbrain/sync-failures.jsonlis append-only history, never auto-cleared.gbrain sync --skip-failedto acknowledge."gbrain sync --skip-failed→Already up to date.→ log unchanged.The flag is a no-op exactly when the user needs it.
Root cause
--skip-failedonly acknowledges failures from the current run. The ack path lives insideperformSyncand is gated onfailedFiles.length > 0(src/commands/sync.ts:695). When the diff is empty (because the user already fixed the bad files) or the sync is up to date, that branch never executes, and the flag never reachesacknowledgeSyncFailures(). The doctor andprintSyncResult:1251messages assume the flag handles stale entries — it doesn't.Repro
Hit while debugging a real brain that had 27 unacked failures from 3 files (2 already-fixed, 1 fresh-broken).
Fix
At the top of
runSync, when--skip-failedis set, eagerly ack any pre-existing unacked failures before any sync work runs. The flag now means "acknowledge whatever is currently flagged and move on" regardless of whether the current run produces new failures or finds nothing to do.The inner per-run ack path stays — it still handles new failures from the current run. The two paths compose:
gbrain sync --skip-failedclears stale + advances past anything new, all in one command, matching what the doctor message promises.After
Tests
Two added in
test/sync-failures.test.ts:acknowledgeSyncFailurespath.Behavior preserved
--skip-failed, the existing block-sync-on-failures behavior is unchanged.--skip-failed, the inner ack path still handles them (the new code only acks pre-existing entries; the inner path acks the run's own).--retry-failedsemantics are unchanged.Need help on this PR? Tag
@codesmithwith what you need.