Skip to content

perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive#7560

Closed
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:perf/bulk-file-sync-ssh-modal
Closed

perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive#7560
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:perf/bulk-file-sync-ssh-modal

Conversation

@kshitijk4poor

@kshitijk4poor kshitijk4poor commented Apr 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Wire bulk_upload_fn into SSH and Modal backends, matching the pattern established for Daytona in #7447. Eliminates per-file transfer overhead during FileSyncManager.sync().

Changes

SSH (tools/environments/ssh.py)

  • Add _ssh_bulk_upload(): stages files in a temp directory mirroring the remote path layout, then pipes tar cf - | ssh tar xf - in a single TCP stream
  • Pre-creates all unique parent directories on remote in one SSH call before transfer
  • Uses ControlMaster socket for the SSH connection (same as existing _scp_upload)
  • Checks mkdir return code and raises on failure
  • Logs tar_create stderr on local tar failure for debugging

Modal (tools/environments/modal.py)

  • Add _modal_bulk_upload(): builds an in-memory gzipped tar archive, base64-encodes it, and decodes+extracts in one exec call
  • Pre-creates parent dirs in the same command (single exec call total)
  • Uses 120s timeout for bulk (vs 15s for per-file)
  • Checks exec exit code and raises on failure (unlike pre-existing _modal_upload)

Tests

  • test_ssh_bulk_upload.py — 8 tests: empty noop, staging dir layout, mkdir ordering, tar pipe structure, extract error, mkdir error, FileSyncManager wiring, ControlMaster usage
  • test_modal_bulk_upload.py — 7 tests: empty noop, tar archive contents, mkdir coverage, single-exec verification, exit code error, FileSyncManager wiring, timeout

Backend status

Backend Upload method Bulk support Status
Daytona SDK upload_files() #7447 Already merged
SSH tar cf | ssh tar xf ✔ this PR New
Modal tar+base64 | exec ✔ this PR New
Docker Bind mount N/A No sync needed
Singularity Bind mount N/A No sync needed

15 new tests, all passing. Existing file_sync, SSH, and Modal tests unaffected.

Closes #7465
Closes #7467

Wire bulk_upload_fn into SSH and Modal backends, matching the
pattern established for Daytona in NousResearch#7447.

SSH: stages files in a temp directory mirroring the remote path
layout, then pipes tar cf | ssh tar xf in a single TCP stream.
Eliminates per-file scp round-trips.

Modal: builds an in-memory gzipped tar archive, base64-encodes it,
and decodes+extracts in one exec call. Eliminates per-file
base64|exec overhead.

Both backends pre-create all unique parent directories in a single
call before the bulk transfer.

Closes NousResearch#7465
Closes NousResearch#7467
@kshitijk4poor

Copy link
Copy Markdown
Collaborator Author

Superseded by #7558 which has a more robust SSH implementation (symlink staging, timeout handling, Popen failure guards) plus the Modal bulk upload from this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: bulk file sync for all remote backends (SSH, Modal, Daytona) perf(ssh): bulk file sync via tar pipe instead of per-file scp

1 participant