Skip to content

v1.56.1.0 fix(sync): staging-dir ownership guard + resume-correctness fixes (#1802)#1856

Merged
garrytan merged 8 commits into
mainfrom
garrytan/sync-gbrain-rmrf-fix
Jun 7, 2026
Merged

v1.56.1.0 fix(sync): staging-dir ownership guard + resume-correctness fixes (#1802)#1856
garrytan merged 8 commits into
mainfrom
garrytan/sync-gbrain-rmrf-fix

Conversation

@garrytan

@garrytan garrytan commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Fixes #1802. Supersedes #1827 (moved to a base-repo branch so eval/E2E CI receives secrets — fork PRs don't).

What this lands

A /sync-gbrain memory sync could recursively delete a user's repo working tree. A crashed import left ~/.gbrain/import-checkpoint.json pointing at the repo root, the next sync's resume path adopted it as the staging dir, and the cleanup step rm -rf'd it. Root cause is a trust failure: the code deleted a path it never proved it owned.

This PR adopts the community fix and bundles the resume-machinery bugs an eng review + Codex pass surfaced in the same code. Six bisected commits:

  1. Adopt fix(sync): fail-closed staging-dir ownership guard — prevent rm -rf of repo (#1802) #1827 (@diazMelgarejo / cyre) — new lib/staging-guard.ts checkOwnedStagingDir(), the single fail-closed predicate (resolvable + direct child of ~/.gstack + .staging-ingest-* name + no .git tripwire + minted .gstack-staging marker), wired into cleanupStagingDir(), decideResume(), the ingest entry point, and makeStagingDir().
  2. D1 (eng review) — the ingest finally called cleanup unconditionally, including on the remote-http persistent transcript dir (which must persist for gstack-brain-sync push). Gated on !remoteHttpMode. Pre-gate this dir was deleted every run (broken artifacts handoff); post-fix(sync): fail-closed staging-dir ownership guard — prevent rm -rf of repo (#1802) #1827 it produced a false "prevent data loss" warning every sync.
  3. C3 (Codex) — an internal gbrain import timeout printed "checkpoint preserved" but the finally then deleted the staging dir (the SIGTERM forwarder only preserves when the parent is signalled). Resume-after-timeout never worked. Now preserves the dir when checkpointed, and tells the truth when it can't.
  4. C4 (Codex) — on resume stagedPathToSource was rebuilt empty, so readNewFailures() couldn't map gbrain's per-file failures back to sources and failed files were silently marked ingested. Reconstructed via a shared stagedRelPath() helper.
  5. C5 hardeningcheckOwnedStagingDir() returns the realpath canonical path and cleanupStagingDir() deletes that (closes a symlink TOCTOU); makeStagingDir() tears down + rethrows if the marker write fails.
  6. v1.60.0.0 — VERSION + CHANGELOG.

Verification

The fork PR's CI never ran (statusCheckRollup: [] — fork, no secrets). Verified on our side:

  • bun test test/regression-1611-gbrain-sync-resume.test.ts test/gstack-memory-ingest.test.ts — 64 pass, 0 fail (resume regression suite grew 9 → 64 assertions: poison matrix, remote-http gate, timeout-preserve, resume failure-mapping, ownership matrix, canonical-path).
  • Full bun test — green, 0 failures.

Out of scope

Durable upstream fix in gbrain (checkpoint.dir should always be gbrain-minted, never CWD) — companion gbrain issue #1728 / PR #1731.

🤖 Generated with Claude Code

garrytan and others added 5 commits June 3, 2026 07:18
…f repo (#1802)

Adopts community fix #1827 by @diazMelgarejo (cyre). New lib/staging-guard.ts
exports checkOwnedStagingDir(), the single fail-closed predicate for 'safe to
recurse-delete or resume into', wired at cleanupStagingDir() (the deletion
chokepoint), decideResume(), the ingest entry point, and makeStagingDir()
(mints the .gstack-staging marker).

Fixes #1802.

Co-Authored-By: cyre <diazMelgarejo@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ugh cleanup (#1802)

The ingest finally ran cleanupStagingDir() unconditionally, but in remote-http
mode stagingDir is the PERSISTENT transcript dir (~/.gstack/transcripts/) that
gstack-brain-sync push must consume. The remote-http branch documents the intent
to skip cleanup, but a finally runs on its return. Gate the call on
!remoteHttpMode so the ownership guard only ever sees .staging-ingest-* dirs.
Pre-gate this dir was deleted outright (broken artifacts handoff); post-#1827 it
produced a false 'prevent data loss' warning every sync.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The import-timeout branch printed 'checkpoint preserved' but the finally then
deleted the staging dir: the SIGTERM forwarder's preserve branch only runs when
the PARENT is signalled, and an internal runGbrainImport timeout kills just the
child and returns normally. So #1611 resume-after-timeout never actually worked.
Mirror the forwarder in the timeout branch: set preserveStaging only when gbrain
checkpointed against this dir (finally then skips cleanup); otherwise clean up
and tell the user it restages instead of falsely promising a resume.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On resume, stagedPathToSource was rebuilt as an empty Map, so readNewFailures()
could not map gbrain's per-file failures back to source paths. Every failure
fell through to state recording — failed files were silently marked ingested and
never retried. Reconstruct the map from the prepared pages via a shared
stagedRelPath() helper (single source of truth with writeStaged, so the keys
can never drift). Exports stagedRelPath + readNewFailures for a behavioral test
proving the reconstructed map recovers the failure the empty map dropped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1802 C5)

checkOwnedStagingDir() now returns the realpath-resolved canonicalPath on a
pass, and cleanupStagingDir() rmSync's that instead of the raw input — closing
the gap where the input is a symlink swapped between the ownership check and the
delete. makeStagingDir() tears down the partial dir and rethrows if the marker
write fails, so a marker-less dir (which the guard would refuse forever) can
never leak.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@trunk-io

trunk-io Bot commented Jun 3, 2026

Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@github-actions github-actions Bot changed the title fix(sync): staging-dir ownership guard + resume-correctness fixes (#1802) v1.60.0.0 fix(sync): staging-dir ownership guard + resume-correctness fixes (#1802) Jun 3, 2026
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

E2E Evals: ✅ PASS

0/0 tests passed | $0 total cost | 12 parallel runners

Suite Result Status Cost

12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

garrytan and others added 2 commits June 3, 2026 08:46
…ixes (#1802)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 401ing

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@garrytan garrytan force-pushed the garrytan/sync-gbrain-rmrf-fix branch from abc9eee to 4c4fea1 Compare June 3, 2026 15:46
@github-actions github-actions Bot changed the title v1.60.0.0 fix(sync): staging-dir ownership guard + resume-correctness fixes (#1802) v1.56.1.0 fix(sync): staging-dir ownership guard + resume-correctness fixes (#1802) Jun 3, 2026
…rmrf-fix

# Conflicts:
#	CHANGELOG.md
#	VERSION
@garrytan garrytan merged commit 476b0ec into main Jun 7, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sync-gbrain memory stage rm -rf'd my repo root: resume path trusts checkpoint.dir without verifying it's a gstack staging dir

1 participant