feat(file-sync): sync remote changes back to host on teardown#8018
feat(file-sync): sync remote changes back to host on teardown#8018alt-glitch wants to merge 3 commits into
Conversation
|
|
@BugBot review |
Add sync_back() to FileSyncManager — on sandbox cleanup, downloads the remote .hermes/ directory as a tar archive, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files. - SHA-256 content hashing on push for accurate change detection - Retry with exponential backoff (3 attempts, 2s/4s/8s) - SIGINT deferred during sync-back to prevent partial writes - fcntl.flock serialization for concurrent gateway sandboxes - Last-write-wins conflict resolution with logged warnings - New files created on remote are pulled back via path inference - Backend implementations: SSH (tar cf over pipe), Modal (exec tar cf, read stdout), Daytona (exec tar cf, SDK download_file) - Wired into cleanup() for all three backends (runs before ControlMaster close / sandbox terminate / sandbox stop) 28 new tests (10 FSM core + 18 backend-specific), 72 total passing.
7bb3a26 to
37c478c
Compare
|
1. Tar paths now match _pushed_hashes keys — backends tar from / so entries have full absolute paths (e.g. root/.hermes/skills/f.py) instead of relative ./skills/f.py that never matched hash lookups 2. _infer_host_path simplified — removed broken grandparent match that computed garbled suffixes for new remote files 3. Lock path uses get_hermes_home() instead of Path.home() — fixes wrong lock path when HERMES_HOME is overridden or using profiles 4. SIGINT trap guarded by threading.current_thread() check — skips signal.signal() on non-main threads (gateway workers) instead of crashing with ValueError on every retry attempt
|
|
@BugBot review |
Snapshot _pushed_hashes alongside _synced_files before the try block so both are restored atomically on failure. Previously a mid-sync exception (e.g. host file deleted between upload and hash) would leave _pushed_hashes partially updated while _synced_files rolled back, causing sync_back() to make wrong change-detection decisions.
|
|
@BugBot review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6aba50f. Configure here.
| self._sandbox.process.exec( | ||
| f"tar cf /tmp/.hermes_sync.tar -C / {shlex.quote(rel_base)}" | ||
| ) | ||
| self._sandbox.fs.download_file("/tmp/.hermes_sync.tar", str(dest)) |
There was a problem hiding this comment.
Daytona bulk download ignores tar command failure
Medium Severity
_daytona_bulk_download doesn't check the exit code of the process.exec tar command, unlike the SSH and Modal backends which both verify the return code and raise RuntimeError on failure. If the tar command fails (e.g., .hermes/ doesn't exist or permission denied), the code proceeds to download_file, which could download a stale /tmp/.hermes_sync.tar left over from a previous run on a persistent sandbox — silently applying outdated file contents back to the host.
Reviewed by Cursor Bugbot for commit 6aba50f. Configure here.
| def cleanup(self): | ||
| if self._sync_manager: | ||
| logger.info("SSH: syncing files from sandbox...") | ||
| self._sync_manager.sync_back() |
There was a problem hiding this comment.
Unguarded sync_back() in cleanup skips resource teardown
Medium Severity
The sync_back() call in all three backends' cleanup() methods is not wrapped in a try-except. While sync_back() catches exceptions internally in its retry loop, code paths before the loop (e.g., lock_path.parent.mkdir) and BaseException subclasses like KeyboardInterrupt (re-raised from the deferred SIGINT mechanism via os.kill) can propagate uncaught, causing the rest of cleanup to be skipped entirely — leaking SSH control sockets, Modal sandboxes, or Daytona sandboxes.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 6aba50f. Configure here.
kshitijk4poor
left a comment
There was a problem hiding this comment.
PR Review — #8018: feat(file-sync): sync remote changes back to host on teardown
Verdict: Request Changes
Tests: 44 passed, 0 failed (3 skipped — pre-existing SSH-marked perf tests)
Critical
- file_sync.py:9 —
import fcntlis unconditional. This module does not exist on Windows and will crash at import time. The codebase already has the correct pattern incron/scheduler.py:21andhermes_cli/auth.py:46:try: import fcntl / except ImportError: fcntl = Nonewithmsvcrtfallback._sync_back_lockedshould skip file locking (or usemsvcrt.locking) whenfcntlis unavailable.
Warnings
-
file_sync.py:275 —
assert self._bulk_download_fn is not Noneis stripped bypython -O. This is a runtime code path, not a dev-only assertion. The caller already guards (if self._bulk_download_fn is None: return), so this assert is defense-in-depth — but should be a properifcheck orraise RuntimeError(...). -
file_sync.py:273-357 —
_resolve_host_pathand_infer_host_patheach callself._get_files_fn()independently, which iterates credentials+skills+cache every call. In_sync_back_impl, this is called once per remote file (potentially twice if resolve fails and infer is tried). O(n*m) where n=synced files, m=changed files. Easy fix: cache the mapping list once at the start of_sync_back_impland pass it to both methods. -
ssh.py, modal.py, daytona.py —
BulkDownloadFnis imported but never used as a type annotation in any backend file. Dead import. (Consistent with the pre-existingBulkUploadFnpattern, so low priority — but would be nice to either use the types or remove the imports.) -
modal.py:356 —
"tar cf - -C / root/.hermes"hardcodes the path. SSH and Daytona dynamically construct the path fromself._remote_home. Modal mirrors the existing hardcoded/root/.hermeson line 269, so it is internally consistent — but worth a comment explaining the assumption (Modal sandboxes always run as root).
Suggestions
-
file_sync.py:291 —
_sha256_file(staged_file)is called for every extracted file including ones wherepushed_hashisNone(new remote files). For new files the hash comparison is always False. Minor optimization: skip hashing whenpushed_hash is None. -
daytona.py:168-170 — The remote temp file
/tmp/.hermes_sync.taris created on the sandbox but never cleaned up. Ifsync_back()retries or the sandbox persists, stale tar files accumulate. Consider addingrm -f /tmp/.hermes_sync.tarafter download. -
Test coverage gaps (MEDIUM): SIGINT deferral logic (main thread vs worker thread paths) is not tested.
_infer_host_pathedge cases (no matching prefix, partial prefix that shouldn't match) — only one positive case tested. Daytonadownload_filefailure propagation not tested.
Looks Good
- Clean architecture: sync_back -> retry -> SIGINT protection -> file lock -> download -> diff -> apply. Good separation of concerns across the 4 method layers.
_pushed_hasheslifecycle is correct: populated on upload, cleared on delete, rolled back on failure.tarfile.openwithfilter="data"— proper security against path traversal.threading.current_thread() is threading.main_thread()check beforesignal.signal()— correct handling for gateway worker threads.- Cleanup ordering is well-tested: all 3 backends verify sync_back runs before sandbox teardown (control_exit, terminate, stop).
- Tests are well-structured with clear helpers, good coverage of the happy path and error paths (retry, exhausted retries, conflict).
- Consistent with the existing bulk upload pattern — download is a clean mirror.
Salvage of PR NousResearch#8018 by @alt-glitch onto current main. On sandbox teardown, FileSyncManager now downloads the remote .hermes/ directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host. Core (tools/environments/file_sync.py): - sync_back(): orchestrates download -> unpack -> diff -> apply with: - Retry with exponential backoff (3 attempts, 2s/4s/8s) - SIGINT trap + defer (prevents partial writes on Ctrl-C) - fcntl.flock serialization (concurrent gateway sandboxes) - Last-write-wins conflict resolution with warning - New remote files pulled back via _infer_host_path prefix matching Backends: - SSH: _ssh_bulk_download — tar cf - piped over SSH - Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read - Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file - All three call sync_back() at the top of cleanup() Fixes applied during salvage (vs original PR NousResearch#8018): | # | Issue | Fix | |---|-------|-----| | C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None | | W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError | | W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer | | W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports | | W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root | | S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) | | S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) | Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup). Based on NousResearch#8018 by @alt-glitch.
Salvage of PR #8018 by @alt-glitch onto current main. On sandbox teardown, FileSyncManager now downloads the remote .hermes/ directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host. Core (tools/environments/file_sync.py): - sync_back(): orchestrates download -> unpack -> diff -> apply with: - Retry with exponential backoff (3 attempts, 2s/4s/8s) - SIGINT trap + defer (prevents partial writes on Ctrl-C) - fcntl.flock serialization (concurrent gateway sandboxes) - Last-write-wins conflict resolution with warning - New remote files pulled back via _infer_host_path prefix matching Backends: - SSH: _ssh_bulk_download — tar cf - piped over SSH - Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read - Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file - All three call sync_back() at the top of cleanup() Fixes applied during salvage (vs original PR #8018): | # | Issue | Fix | |---|-------|-----| | C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None | | W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError | | W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer | | W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports | | W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root | | S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) | | S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) | Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup). Based on #8018 by @alt-glitch.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR #8018: tools/environments/daytona.py - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls against the same sandbox don't collide on the remote temp path - Move sync_back() inside the cleanup lock and after the _sandbox-None guard, with its own try/except. Previously a no-op cleanup (sandbox already cleared) still fired sync_back → 3-attempt retry storm against a nil sandbox (~6s of sleep). Now short-circuits cleanly. tools/environments/file_sync.py - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a tar larger than the limit. Protects against runaway sandboxes producing arbitrary-size archives. - Add 'nothing previously pushed' guard at the top of sync_back(). If _pushed_hashes and _synced_files are both empty, the FileSyncManager was never initialized from the host side — there is nothing coherent to sync back. Skips the retry/backoff machinery on uninitialized managers and eliminates test-suite slowdown from pre-existing cleanup tests that don't mock the sync layer. tests/tools/test_file_sync_back.py - Update _make_manager helper to seed a _pushed_hashes entry by default so sync_back() exercises its real path. A seed_pushed_state=False opt-out is available for noop-path tests. - Add TestSyncBackSizeCap with positive and negative coverage of the new cap. tests/tools/test_sync_back_backends.py - Update Daytona bulk download test to assert the PID-suffixed path pattern instead of the fixed /tmp/.hermes_sync.tar.
Salvage of PR #8018 by @alt-glitch onto current main. On sandbox teardown, FileSyncManager now downloads the remote .hermes/ directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host. Core (tools/environments/file_sync.py): - sync_back(): orchestrates download -> unpack -> diff -> apply with: - Retry with exponential backoff (3 attempts, 2s/4s/8s) - SIGINT trap + defer (prevents partial writes on Ctrl-C) - fcntl.flock serialization (concurrent gateway sandboxes) - Last-write-wins conflict resolution with warning - New remote files pulled back via _infer_host_path prefix matching Backends: - SSH: _ssh_bulk_download — tar cf - piped over SSH - Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read - Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file - All three call sync_back() at the top of cleanup() Fixes applied during salvage (vs original PR #8018): | # | Issue | Fix | |---|-------|-----| | C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None | | W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError | | W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer | | W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports | | W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root | | S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) | | S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) | Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup). Based on #8018 by @alt-glitch.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR #8018: tools/environments/daytona.py - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls against the same sandbox don't collide on the remote temp path - Move sync_back() inside the cleanup lock and after the _sandbox-None guard, with its own try/except. Previously a no-op cleanup (sandbox already cleared) still fired sync_back → 3-attempt retry storm against a nil sandbox (~6s of sleep). Now short-circuits cleanly. tools/environments/file_sync.py - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a tar larger than the limit. Protects against runaway sandboxes producing arbitrary-size archives. - Add 'nothing previously pushed' guard at the top of sync_back(). If _pushed_hashes and _synced_files are both empty, the FileSyncManager was never initialized from the host side — there is nothing coherent to sync back. Skips the retry/backoff machinery on uninitialized managers and eliminates test-suite slowdown from pre-existing cleanup tests that don't mock the sync layer. tests/tools/test_file_sync_back.py - Update _make_manager helper to seed a _pushed_hashes entry by default so sync_back() exercises its real path. A seed_pushed_state=False opt-out is available for noop-path tests. - Add TestSyncBackSizeCap with positive and negative coverage of the new cap. tests/tools/test_sync_back_backends.py - Update Daytona bulk download test to assert the PID-suffixed path pattern instead of the fixed /tmp/.hermes_sync.tar.
|
Merged via #11291. Your original implementation made it onto main through @kshitijk4poor's salvage PR (#8189) — they cherry-picked your work, addressed the first review round, and we then cherry-picked that onto current main with further hardening. Thanks for the clean architecture (retry + SIGINT defer + flock + hash diff) — it held up through two rebase rounds untouched. |
Salvage of PR NousResearch#8018 by @alt-glitch onto current main. On sandbox teardown, FileSyncManager now downloads the remote .hermes/ directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host. Core (tools/environments/file_sync.py): - sync_back(): orchestrates download -> unpack -> diff -> apply with: - Retry with exponential backoff (3 attempts, 2s/4s/8s) - SIGINT trap + defer (prevents partial writes on Ctrl-C) - fcntl.flock serialization (concurrent gateway sandboxes) - Last-write-wins conflict resolution with warning - New remote files pulled back via _infer_host_path prefix matching Backends: - SSH: _ssh_bulk_download — tar cf - piped over SSH - Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read - Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file - All three call sync_back() at the top of cleanup() Fixes applied during salvage (vs original PR NousResearch#8018): | # | Issue | Fix | |---|-------|-----| | C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None | | W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError | | W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer | | W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports | | W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root | | S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) | | S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) | Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup). Based on NousResearch#8018 by @alt-glitch.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR NousResearch#8018: tools/environments/daytona.py - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls against the same sandbox don't collide on the remote temp path - Move sync_back() inside the cleanup lock and after the _sandbox-None guard, with its own try/except. Previously a no-op cleanup (sandbox already cleared) still fired sync_back → 3-attempt retry storm against a nil sandbox (~6s of sleep). Now short-circuits cleanly. tools/environments/file_sync.py - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a tar larger than the limit. Protects against runaway sandboxes producing arbitrary-size archives. - Add 'nothing previously pushed' guard at the top of sync_back(). If _pushed_hashes and _synced_files are both empty, the FileSyncManager was never initialized from the host side — there is nothing coherent to sync back. Skips the retry/backoff machinery on uninitialized managers and eliminates test-suite slowdown from pre-existing cleanup tests that don't mock the sync layer. tests/tools/test_file_sync_back.py - Update _make_manager helper to seed a _pushed_hashes entry by default so sync_back() exercises its real path. A seed_pushed_state=False opt-out is available for noop-path tests. - Add TestSyncBackSizeCap with positive and negative coverage of the new cap. tests/tools/test_sync_back_backends.py - Update Daytona bulk download test to assert the PID-suffixed path pattern instead of the fixed /tmp/.hermes_sync.tar.
Salvage of PR NousResearch#8018 by @alt-glitch onto current main. On sandbox teardown, FileSyncManager now downloads the remote .hermes/ directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host. Core (tools/environments/file_sync.py): - sync_back(): orchestrates download -> unpack -> diff -> apply with: - Retry with exponential backoff (3 attempts, 2s/4s/8s) - SIGINT trap + defer (prevents partial writes on Ctrl-C) - fcntl.flock serialization (concurrent gateway sandboxes) - Last-write-wins conflict resolution with warning - New remote files pulled back via _infer_host_path prefix matching Backends: - SSH: _ssh_bulk_download — tar cf - piped over SSH - Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read - Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file - All three call sync_back() at the top of cleanup() Fixes applied during salvage (vs original PR NousResearch#8018): | # | Issue | Fix | |---|-------|-----| | C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None | | W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError | | W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer | | W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports | | W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root | | S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) | | S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) | Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup). Based on NousResearch#8018 by @alt-glitch.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR NousResearch#8018: tools/environments/daytona.py - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls against the same sandbox don't collide on the remote temp path - Move sync_back() inside the cleanup lock and after the _sandbox-None guard, with its own try/except. Previously a no-op cleanup (sandbox already cleared) still fired sync_back → 3-attempt retry storm against a nil sandbox (~6s of sleep). Now short-circuits cleanly. tools/environments/file_sync.py - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a tar larger than the limit. Protects against runaway sandboxes producing arbitrary-size archives. - Add 'nothing previously pushed' guard at the top of sync_back(). If _pushed_hashes and _synced_files are both empty, the FileSyncManager was never initialized from the host side — there is nothing coherent to sync back. Skips the retry/backoff machinery on uninitialized managers and eliminates test-suite slowdown from pre-existing cleanup tests that don't mock the sync layer. tests/tools/test_file_sync_back.py - Update _make_manager helper to seed a _pushed_hashes entry by default so sync_back() exercises its real path. A seed_pushed_state=False opt-out is available for noop-path tests. - Add TestSyncBackSizeCap with positive and negative coverage of the new cap. tests/tools/test_sync_back_backends.py - Update Daytona bulk download test to assert the PID-suffixed path pattern instead of the fixed /tmp/.hermes_sync.tar.
Salvage of PR NousResearch#8018 by @alt-glitch onto current main. On sandbox teardown, FileSyncManager now downloads the remote .hermes/ directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host. Core (tools/environments/file_sync.py): - sync_back(): orchestrates download -> unpack -> diff -> apply with: - Retry with exponential backoff (3 attempts, 2s/4s/8s) - SIGINT trap + defer (prevents partial writes on Ctrl-C) - fcntl.flock serialization (concurrent gateway sandboxes) - Last-write-wins conflict resolution with warning - New remote files pulled back via _infer_host_path prefix matching Backends: - SSH: _ssh_bulk_download — tar cf - piped over SSH - Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read - Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file - All three call sync_back() at the top of cleanup() Fixes applied during salvage (vs original PR NousResearch#8018): | # | Issue | Fix | |---|-------|-----| | C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None | | W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError | | W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer | | W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports | | W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root | | S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) | | S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) | Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup). Based on NousResearch#8018 by @alt-glitch.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR NousResearch#8018: tools/environments/daytona.py - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls against the same sandbox don't collide on the remote temp path - Move sync_back() inside the cleanup lock and after the _sandbox-None guard, with its own try/except. Previously a no-op cleanup (sandbox already cleared) still fired sync_back → 3-attempt retry storm against a nil sandbox (~6s of sleep). Now short-circuits cleanly. tools/environments/file_sync.py - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a tar larger than the limit. Protects against runaway sandboxes producing arbitrary-size archives. - Add 'nothing previously pushed' guard at the top of sync_back(). If _pushed_hashes and _synced_files are both empty, the FileSyncManager was never initialized from the host side — there is nothing coherent to sync back. Skips the retry/backoff machinery on uninitialized managers and eliminates test-suite slowdown from pre-existing cleanup tests that don't mock the sync layer. tests/tools/test_file_sync_back.py - Update _make_manager helper to seed a _pushed_hashes entry by default so sync_back() exercises its real path. A seed_pushed_state=False opt-out is available for noop-path tests. - Add TestSyncBackSizeCap with positive and negative coverage of the new cap. tests/tools/test_sync_back_backends.py - Update Daytona bulk download test to assert the PID-suffixed path pattern instead of the fixed /tmp/.hermes_sync.tar.
Salvage of PR NousResearch#8018 by @alt-glitch onto current main. On sandbox teardown, FileSyncManager now downloads the remote .hermes/ directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host. Core (tools/environments/file_sync.py): - sync_back(): orchestrates download -> unpack -> diff -> apply with: - Retry with exponential backoff (3 attempts, 2s/4s/8s) - SIGINT trap + defer (prevents partial writes on Ctrl-C) - fcntl.flock serialization (concurrent gateway sandboxes) - Last-write-wins conflict resolution with warning - New remote files pulled back via _infer_host_path prefix matching Backends: - SSH: _ssh_bulk_download — tar cf - piped over SSH - Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read - Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file - All three call sync_back() at the top of cleanup() Fixes applied during salvage (vs original PR NousResearch#8018): | # | Issue | Fix | |---|-------|-----| | C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None | | W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError | | W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer | | W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports | | W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root | | S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) | | S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) | Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup). Based on NousResearch#8018 by @alt-glitch.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR NousResearch#8018: tools/environments/daytona.py - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls against the same sandbox don't collide on the remote temp path - Move sync_back() inside the cleanup lock and after the _sandbox-None guard, with its own try/except. Previously a no-op cleanup (sandbox already cleared) still fired sync_back → 3-attempt retry storm against a nil sandbox (~6s of sleep). Now short-circuits cleanly. tools/environments/file_sync.py - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a tar larger than the limit. Protects against runaway sandboxes producing arbitrary-size archives. - Add 'nothing previously pushed' guard at the top of sync_back(). If _pushed_hashes and _synced_files are both empty, the FileSyncManager was never initialized from the host side — there is nothing coherent to sync back. Skips the retry/backoff machinery on uninitialized managers and eliminates test-suite slowdown from pre-existing cleanup tests that don't mock the sync layer. tests/tools/test_file_sync_back.py - Update _make_manager helper to seed a _pushed_hashes entry by default so sync_back() exercises its real path. A seed_pushed_state=False opt-out is available for noop-path tests. - Add TestSyncBackSizeCap with positive and negative coverage of the new cap. tests/tools/test_sync_back_backends.py - Update Daytona bulk download test to assert the PID-suffixed path pattern instead of the fixed /tmp/.hermes_sync.tar.
Salvage of PR NousResearch#8018 by @alt-glitch onto current main. On sandbox teardown, FileSyncManager now downloads the remote .hermes/ directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host. Core (tools/environments/file_sync.py): - sync_back(): orchestrates download -> unpack -> diff -> apply with: - Retry with exponential backoff (3 attempts, 2s/4s/8s) - SIGINT trap + defer (prevents partial writes on Ctrl-C) - fcntl.flock serialization (concurrent gateway sandboxes) - Last-write-wins conflict resolution with warning - New remote files pulled back via _infer_host_path prefix matching Backends: - SSH: _ssh_bulk_download — tar cf - piped over SSH - Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read - Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file - All three call sync_back() at the top of cleanup() Fixes applied during salvage (vs original PR NousResearch#8018): | # | Issue | Fix | |---|-------|-----| | C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None | | W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError | | W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer | | W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports | | W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root | | S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) | | S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) | Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup). Based on NousResearch#8018 by @alt-glitch.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR NousResearch#8018: tools/environments/daytona.py - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls against the same sandbox don't collide on the remote temp path - Move sync_back() inside the cleanup lock and after the _sandbox-None guard, with its own try/except. Previously a no-op cleanup (sandbox already cleared) still fired sync_back → 3-attempt retry storm against a nil sandbox (~6s of sleep). Now short-circuits cleanly. tools/environments/file_sync.py - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a tar larger than the limit. Protects against runaway sandboxes producing arbitrary-size archives. - Add 'nothing previously pushed' guard at the top of sync_back(). If _pushed_hashes and _synced_files are both empty, the FileSyncManager was never initialized from the host side — there is nothing coherent to sync back. Skips the retry/backoff machinery on uninitialized managers and eliminates test-suite slowdown from pre-existing cleanup tests that don't mock the sync layer. tests/tools/test_file_sync_back.py - Update _make_manager helper to seed a _pushed_hashes entry by default so sync_back() exercises its real path. A seed_pushed_state=False opt-out is available for noop-path tests. - Add TestSyncBackSizeCap with positive and negative coverage of the new cap. tests/tools/test_sync_back_backends.py - Update Daytona bulk download test to assert the PID-suffixed path pattern instead of the fixed /tmp/.hermes_sync.tar.


Summary
Implements Phase 2 of the bulk file sync spec (
bulk-sync-spec.md). On sandbox teardown,FileSyncManagernow downloads the remote.hermes/directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host.Depends on #8014 (bulk upload salvage — cherry-picked as base).
Changes
Core (
tools/environments/file_sync.py)BulkDownloadFncallback type:Callable[[Path], None]— backend writes a tar archive to the given path_pushed_hashes: SHA-256 content hashes computed on upload, stored alongside mtime+sizesync_back(): orchestrates download → unpack → diff → apply with:fcntl.flockon.hermes/.sync.lock(serializes concurrent gateway sandboxes)_infer_host_pathprefix matchingBackends
_ssh_bulk_download—tar cf - -C ~/.hermes .piped over SSH to local file_modal_bulk_download—exec tar cf -→proc.stdout.read→ write to file_daytona_bulk_download—exec tar cfon remote → SDKdownload_file()bulk_download_fnintoFileSyncManagerand callsync_back()at the top ofcleanup()(before ControlMaster close / sandbox terminate / sandbox stop)Tests
72 total tests passing.
Test plan
filter="data"(Python 3.14 compat)🤖 Generated with Claude Code