Skip to content

feat(file-sync): sync remote changes back to host on teardown#8189

Closed
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:fix/file-sync-back-salvage
Closed

feat(file-sync): sync remote changes back to host on teardown#8189
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:fix/file-sync-back-salvage

Conversation

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Summary

Salvage of PR #8018 by @alt-glitch onto current main. Implements Phase 2 of the bulk file sync spec — on sandbox teardown, FileSyncManager downloads the remote .hermes/ directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host.

Depends on #8014 (bulk upload — already merged as base).

Changes

Core (tools/environments/file_sync.py)

  • sync_back(): orchestrates download → unpack → diff → apply with:
    • Retry with exponential backoff (3 attempts, 2s/4s/8s)
    • SIGINT trap + defer (prevents partial writes on Ctrl-C)
    • fcntl.flock serialization (concurrent gateway sandboxes)
    • Last-write-wins conflict resolution with warning
    • New remote files pulled back via _infer_host_path prefix matching

Backends

  • SSH: _ssh_bulk_downloadtar cf - piped over SSH
  • Modal: _modal_bulk_downloadexec tar cf -proc.stdout.read
  • Daytona: _daytona_bulk_downloadexec tar cf → SDK download_file
  • All three call sync_back() at the top of cleanup()

Fixes applied during salvage (vs original PR #8018)

# Issue Fix
C1 import fcntl unconditional — crashes on Windows try/except ImportError with fallback; _sync_back_locked skips locking when fcntl is None
W1 assert used for runtime guard (stripped by python -O) Replaced with proper if/raise RuntimeError
W2 O(n×m) from _get_files_fn() called per file in inner loop Cache mapping list once at start of _sync_back_impl, pass to _resolve_host_path/_infer_host_path
W3 Dead BulkDownloadFn imports in SSH/Modal/Daytona Removed unused imports from all 3 backends
W4 Modal hardcodes root/.hermes with no explanation Added docstring comment explaining Modal sandboxes always run as root
S1 SHA-256 computed for new remote files where pushed_hash=None Skip hashing when pushed_hash is None (comparison is always False)
S2 Daytona /tmp/.hermes_sync.tar never cleaned up on remote Added rm -f after download (best-effort cleanup)

Tests

49 passing (17 new tests added during salvage):

  • _infer_host_path edge cases: no match, partial prefix false match, correct match
  • SIGINT deferral: main thread handler swap, worker thread skips signal
  • Windows fallback: fcntl=None skips file locking without error
  • Daytona tar cleanup: verifies rm -f called after download
  • Updated Daytona download assertions for 2-call exec pattern

Based on #8018 by @alt-glitch.

Salvage of PR NousResearch#8018 by @alt-glitch onto current main.

On sandbox teardown, FileSyncManager now downloads the remote .hermes/
directory, diffs against SHA-256 hashes of what was originally pushed,
and applies only changed files back to the host.

Core (tools/environments/file_sync.py):
- sync_back(): orchestrates download -> unpack -> diff -> apply with:
  - Retry with exponential backoff (3 attempts, 2s/4s/8s)
  - SIGINT trap + defer (prevents partial writes on Ctrl-C)
  - fcntl.flock serialization (concurrent gateway sandboxes)
  - Last-write-wins conflict resolution with warning
  - New remote files pulled back via _infer_host_path prefix matching

Backends:
- SSH: _ssh_bulk_download — tar cf - piped over SSH
- Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read
- Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file
- All three call sync_back() at the top of cleanup()

Fixes applied during salvage (vs original PR NousResearch#8018):

| # | Issue | Fix |
|---|-------|-----|
| C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None |
| W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError |
| W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer |
| W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports |
| W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root |
| S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) |
| S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) |

Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT
main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup).

Based on NousResearch#8018 by @alt-glitch.
@teknium1

Copy link
Copy Markdown
Contributor

Merged via #11291. Your commit was cherry-picked onto current main with authorship preserved (commit d64446e). Thanks for the thorough salvage of #8018 — every review item from the previous round was addressed and the 17 extra tests you added caught real edge cases. Follow-up hardening on top: PID-suffix on Daytona temp path, 2 GiB size cap, 'nothing previously pushed' guard, and lifecycle ordering fix in Daytona cleanup().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants