fix(rest): copy_out single-file mode writes at host_dst, not inside it#692
Conversation
`box.copy_out("/guest/blob", "/host/blob_copy")` over REST left a
DIRECTORY at `/host/blob_copy/` containing the file as
`/host/blob_copy/blob`, instead of writing the file to
`/host/blob_copy` directly. The SDK called `archive.unpack(host_dst)`
unconditionally — `tar::Archive::unpack` always treats its argument
as a destination directory and writes entries relative to it.
The local-FFI copy_out preserves single-file semantics ("dest is the
file path"). Adapt the REST extraction to the same shape:
- Probe the archive once. If exactly one regular file and no other
entries, open `host_dst` as a file and stream the entry into it
(creating any missing parent directories first).
- Otherwise (multi-file, or any directory/symlink entry), fall back
to the old `archive.unpack(host_dst)` path, which is correct for
`box.copy_out("/guest/tree/", "/host/tree/")` shapes.
Mirror of the runner-side single-file detection in
`apps/runner/pkg/api/controllers/boxlite_files.go::extractTarToDir`
shipped in PR boxlite-ai#688 — same situation, opposite direction.
E2E regression test:
sdks/python/tests/e2e/test_files_io.py::test_copy_out_binary_roundtrips_sha256
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe PR updates the ChangesTar extraction single-file vs multi-entry handling
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/boxlite/src/rest/litebox.rs (1)
924-927: ⚡ Quick winRemove unused
single_entry_pathvariable.
single_entry_pathis captured during the probe pass but only referenced in a no-oplet _ =statement. This is dead code that should be removed.♻️ Proposed fix
- let mut single_entry_path: Option<std::path::PathBuf> = None; for entry in probe.entries().map_err(|e| { BoxliteError::Internal(format!("failed to read tar archive: {}", e)) })? { let entry = entry.map_err(|e| { BoxliteError::Internal(format!("failed to read tar entry: {}", e)) })?; let header = entry.header(); match header.entry_type() { tar::EntryType::Regular => { file_count += 1; - if single_entry_path.is_none() { - single_entry_path = - Some(entry.path().map(|c| c.into_owned()).unwrap_or_default()); - } } _ => other_count += 1, } }And remove line 973:
- // Probe said one regular file but the second pass found - // none — defensive fallthrough; shouldn't happen. - let _ = single_entry_path;Also applies to: 973-973
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/boxlite/src/rest/litebox.rs` around lines 924 - 927, Remove the dead `single_entry_path` capture and its related no-op usage: delete the `single_entry_path` variable declaration/initialization and the block that assigns `Some(entry.path()...)` (the code using `single_entry_path` during the probe pass), and also remove the later `let _ = single_entry_path` no-op; update any surrounding logic in the probe pass so behavior remains unchanged without that unused variable (references: single_entry_path, entry.path()).Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/boxlite/src/rest/litebox.rs`:
- Around line 914-919: The for-loop formatting around
probe.entries().map_err(...) and the subsequent entry.map_err(...) closures does
not match rustfmt style; reformat the loop to obey rustfmt (or run cargo fmt
--package boxlite) so the two map_err closures and their
BoxliteError::Internal(...) messages are split across lines consistently (ensure
the closure bodies are indented and wrapped, e.g., place the closure opening on
the next line and wrap the format! call), targeting the probe.entries() map_err
and the entry.map_err closures in litebox.rs so CI passes.
---
Nitpick comments:
In `@src/boxlite/src/rest/litebox.rs`:
- Around line 924-927: Remove the dead `single_entry_path` capture and its
related no-op usage: delete the `single_entry_path` variable
declaration/initialization and the block that assigns `Some(entry.path()...)`
(the code using `single_entry_path` during the probe pass), and also remove the
later `let _ = single_entry_path` no-op; update any surrounding logic in the
probe pass so behavior remains unchanged without that unused variable
(references: single_entry_path, entry.path()).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c74af55a-036a-4239-a849-4680ec55378d
📒 Files selected for processing (1)
src/boxlite/src/rest/litebox.rs
…#688) copy_in saves the uploaded tar as a binary blob instead of extracting ## Bug `PUT /v1/{prefix}/boxes/{id}/files` (the runner handler for SDK `copy_in`) silently broke both single-file and directory uploads: ``` copy_in("host.txt", "/root/hello.txt") → /root/hello.txt = raw tar bytes (ustar header + payload), not the file copy_in("host_dir/", "/root/flatdest/") → /root/flatdest/boxlite-upload-XXXXXX.tar = the directory as one big tar ``` Root cause: the SDK Rust REST client serialises the source as a tar archive in the body. The runner buffered it to `<tmp>/boxlite-upload-*.tar` and called `Boxlite.CopyInto(tarPath, dest)`, which treats `src` as a literal filesystem path — so the tar archive itself was the "file" that landed in the guest. ## Fix Extract the uploaded tar at the runner before calling `CopyInto`: | Archive shape | Behaviour | |---|---| | 1 regular file, no other entries | write at `dest` (single-file copy_in) | | Multiple entries / dirs / symlinks / specials | `unpack` into `dest` (directory copy_in) | | Any entry with `..` in path | reject (zip-slip guard) | ## Test plan — two-sided verified Pin: `scripts/test/e2e/cases/test_files_io.py` Stack: local e2e. Runner binary swapped between main and PR builds. | Case | Pre-fix (main) | Post-fix (this PR) | |---|---|---| | `test_copy_in_text_roundtrips_byte_exact` (host writes `"üñîçødé\n"`, copy_in, guest cat) | `cat` returned raw tar archive text + ustar header (`'hello.txt…ustar…<payload>'`) — file at expected path didn't exist | **byte-exact payload** at `/root/hello.txt` | | `test_copy_in_directory_include_parent_false` (copy_in dir with nested files; `find -type f`) | only `/root/flatdest/boxlite-upload-XXXXXX.tar` (the raw tar landed as one file) | **both `top.txt` and `sub/deep.txt`** under `/root/flatdest/` | | `test_copy_in_overwrite_false_rejects_conflict` | empty | empty — runner-side extract always uses `O_TRUNC`. `overwrite=False` policy not honoured on REST yet | | `test_copy_out_binary_roundtrips_sha256` | `IsADirectoryError` on host_dest | `IsADirectoryError` on host_dest — SDK Rust copy_out has the mirror bug (raw tar saved as dir) | This PR closes the two known-broken copy_in paths (single file + directory). The two residual failures are different bugs: - `overwrite=False` needs the tar-extract path to respect the option; minor follow-up (5 lines in extract_tar). - `copy_out` needs the SAME fix on the SDK side (the SDK currently saves the runner's tar response without extracting). That's a Rust-SDK PR (#692), scope of a separate change. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Improvements** * Enhanced file upload handling to support efficient transfers of both individual files and directory structures. * **Tests** * Added comprehensive end-to-end tests for file operations, verifying data integrity, text preservation, binary round-tripping, recursive directory handling, and overwrite behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
#710) add 5 e2e cases pinning REST contracts not covered today ## Coverage gaps Three classes of behaviour run through the SDK → API → Runner → VM chain but had no e2e pin on the chain itself; bugs in any one of them would silently regress through `make test:integration:*` (which only exercises local-FFI): ``` 1. CLI detach lifecycle — `boxlite run -d` returns, CLI process exits, does a fresh CLI process still see / exec the box? FFI side src/boxlite/tests/detach.rs, recovery.rs ✓ E2E side scripts/test/e2e/cases/ ✗ 2. Execution.attach / reattach contract — bogus exec_id should be a typed client error; reattach to a completed exec should return a usable handle. Runner side apps/runner/.../boxlite_exec_attach_test.go ✓ SDK <-> API end-to-end ✗ 3. host bind mount via REST — the cloud runtime intentionally dropped host bind mounts (#639); REST callers passing host paths must silently no-op, no /mnt/<x> in the guest. FFI surface src/boxlite/tests/mount_security.rs ✓ REST contract ✗ ``` ## Cases shipped ``` scripts/test/e2e/cases/test_cli_detach_recovery.py (2 cases) scripts/test/e2e/cases/test_exec_attach.py (2 cases) scripts/test/e2e/cases/test_volume_readonly.py (1 case) ``` 5 cases total. Author also dropped 6 cases from this branch's earlier incarnation that were already committed alongside their respective fix PRs (#686 / #688 / #689 / #691 / #692 / #696), so the diff is strictly net-new coverage. ## Test plan — run against current main Stack: local e2e, runner unchanged, no source edits. | Case | Result | Notes | |---|---|---| | `test_detached_box_exec_propagates_exit_code_on_fresh_cli` | ✅ PASS | exit-code passthrough across CLI processes | | `test_detached_box_survives_cli_exit_and_is_reusable` |⚠️ XFAIL (strict) | reaches step 3 (`boxlite exec <id> echo still-alive`) then hits the stdout-drop race that #563 fixes — marker drops when #563 lands | | `test_attach_with_bogus_id_is_typed_error` | ✅ PASS | bogus exec_id → typed `Exception` (not 5xx, not silent) | | `test_reattach_after_original_completes` |⚠️ XFAIL (strict) | same stdout-drop race (#563) on the original exec's `out=='first-output'` assertion | | `test_host_bind_mount_via_rest_is_silently_ignored` | ✅ PASS | the box created with `volumes=[(host_dir, "/mnt/ro", True)]` reports `MOUNT_LINE=<none>` from `/proc/mounts` and the host marker file is untouched — REST silently dropped the host path | The 2 XFAILs are tied to #563 (`fix(go-sdk): fold stream drain into Execution.Wait`). Once #563 merges, both xfails flip xpass-strict, the markers come off, and the suite is 5/5 green. No additional fix work is needed in this PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Added end-to-end tests verifying CLI detach survival and box reusability across fresh CLI processes, including exit code propagation tests. * Added end-to-end tests for SDK reattach functionality, validating session state after execution completion. * Added end-to-end test confirming proper host bind mount handling during REST execution. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
copy_out single-file mode writes a directory wrapping the file instead of the file at host_dst
Bug
box.copy_out("/guest/blob", "/host/blob_copy")over REST landed at thewrong layout on the host:
Local-FFI
copy_outkeeps single-file semantics (dest is the file path);REST broke that contract.
Root cause: the SDK Rust REST client used
tar::Archive::unpack(host_dst)unconditionally.
unpackalways treats its arg as a destination directoryand extracts every entry under it. For a single-file archive that means
writing the entry as a child of
host_dstrather than AThost_dst.Fix
Detect single-file archives and write the entry directly at
host_dst:host_dstunpack(host_dst)as beforeMirror of #688's runner-side single-file detection — same situation,
opposite direction.
Test plan — two-sided verified
Pin:
scripts/test/e2e/cases/test_files_io.py::test_copy_out_binary_roundtrips_sha256Stack: local e2e. C lib + Python wheel rebuilt; no runner change needed (this is SDK-side only).
copy_out('/root/blob', '/host/blob_copy')thenread_bytes('/host/blob_copy')IsADirectoryError: '/host/blob_copy'(host wrote a directory wrapping the file)copy_out('/root/dir/', '/host/dir/')(multi-entry archive)archive.unpack)Defensive: the single-file path is gated on
file_count == 1 && other_count == 0, so any directory/symlink/link entry demotes to the multi-entry path. Zero-file archives go through the multi-entry path too (does nothing, same as before).Summary by CodeRabbit