v1.56.1.0 fix(sync): staging-dir ownership guard + resume-correctness fixes (#1802)#1856
Merged
Conversation
…f repo (#1802) Adopts community fix #1827 by @diazMelgarejo (cyre). New lib/staging-guard.ts exports checkOwnedStagingDir(), the single fail-closed predicate for 'safe to recurse-delete or resume into', wired at cleanupStagingDir() (the deletion chokepoint), decideResume(), the ingest entry point, and makeStagingDir() (mints the .gstack-staging marker). Fixes #1802. Co-Authored-By: cyre <diazMelgarejo@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ugh cleanup (#1802) The ingest finally ran cleanupStagingDir() unconditionally, but in remote-http mode stagingDir is the PERSISTENT transcript dir (~/.gstack/transcripts/) that gstack-brain-sync push must consume. The remote-http branch documents the intent to skip cleanup, but a finally runs on its return. Gate the call on !remoteHttpMode so the ownership guard only ever sees .staging-ingest-* dirs. Pre-gate this dir was deleted outright (broken artifacts handoff); post-#1827 it produced a false 'prevent data loss' warning every sync. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The import-timeout branch printed 'checkpoint preserved' but the finally then deleted the staging dir: the SIGTERM forwarder's preserve branch only runs when the PARENT is signalled, and an internal runGbrainImport timeout kills just the child and returns normally. So #1611 resume-after-timeout never actually worked. Mirror the forwarder in the timeout branch: set preserveStaging only when gbrain checkpointed against this dir (finally then skips cleanup); otherwise clean up and tell the user it restages instead of falsely promising a resume. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On resume, stagedPathToSource was rebuilt as an empty Map, so readNewFailures() could not map gbrain's per-file failures back to source paths. Every failure fell through to state recording — failed files were silently marked ingested and never retried. Reconstruct the map from the prepared pages via a shared stagedRelPath() helper (single source of truth with writeStaged, so the keys can never drift). Exports stagedRelPath + readNewFailures for a behavioral test proving the reconstructed map recovers the failure the empty map dropped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1802 C5) checkOwnedStagingDir() now returns the realpath-resolved canonicalPath on a pass, and cleanupStagingDir() rmSync's that instead of the raw input — closing the gap where the input is a symlink swapped between the ownership check and the delete. makeStagingDir() tears down the partial dir and rethrows if the marker write fails, so a marker-less dir (which the guard would refuse forever) can never leak. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
E2E Evals: ✅ PASS0/0 tests passed | $0 total cost | 12 parallel runners
12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite |
…ixes (#1802) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 401ing Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
abc9eee to
4c4fea1
Compare
…rmrf-fix # Conflicts: # CHANGELOG.md # VERSION
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1802. Supersedes #1827 (moved to a base-repo branch so eval/E2E CI receives secrets — fork PRs don't).
What this lands
A
/sync-gbrainmemory sync could recursively delete a user's repo working tree. A crashed import left~/.gbrain/import-checkpoint.jsonpointing at the repo root, the next sync's resume path adopted it as the staging dir, and the cleanup steprm -rf'd it. Root cause is a trust failure: the code deleted a path it never proved it owned.This PR adopts the community fix and bundles the resume-machinery bugs an eng review + Codex pass surfaced in the same code. Six bisected commits:
lib/staging-guard.tscheckOwnedStagingDir(), the single fail-closed predicate (resolvable + direct child of~/.gstack+.staging-ingest-*name + no.gittripwire + minted.gstack-stagingmarker), wired intocleanupStagingDir(),decideResume(), the ingest entry point, andmakeStagingDir().finallycalled cleanup unconditionally, including on the remote-http persistent transcript dir (which must persist forgstack-brain-sync push). Gated on!remoteHttpMode. Pre-gate this dir was deleted every run (broken artifacts handoff); post-fix(sync): fail-closed staging-dir ownership guard — prevent rm -rf of repo (#1802) #1827 it produced a false "prevent data loss" warning every sync.gbrain importtimeout printed "checkpoint preserved" but thefinallythen deleted the staging dir (the SIGTERM forwarder only preserves when the parent is signalled). Resume-after-timeout never worked. Now preserves the dir when checkpointed, and tells the truth when it can't.stagedPathToSourcewas rebuilt empty, soreadNewFailures()couldn't map gbrain's per-file failures back to sources and failed files were silently marked ingested. Reconstructed via a sharedstagedRelPath()helper.checkOwnedStagingDir()returns the realpath canonical path andcleanupStagingDir()deletes that (closes a symlink TOCTOU);makeStagingDir()tears down + rethrows if the marker write fails.Verification
The fork PR's CI never ran (
statusCheckRollup: []— fork, no secrets). Verified on our side:bun test test/regression-1611-gbrain-sync-resume.test.ts test/gstack-memory-ingest.test.ts— 64 pass, 0 fail (resume regression suite grew 9 → 64 assertions: poison matrix, remote-http gate, timeout-preserve, resume failure-mapping, ownership matrix, canonical-path).bun test— green, 0 failures.Out of scope
Durable upstream fix in gbrain (checkpoint.dir should always be gbrain-minted, never CWD) — companion gbrain issue #1728 / PR #1731.
🤖 Generated with Claude Code