Skip to content

fix(rest): copy_out single-file mode writes at host_dst, not inside it#692

Merged
DorianZheng merged 2 commits into
boxlite-ai:mainfrom
G4614:fix/rest-copy-out-extract
Jun 9, 2026
Merged

fix(rest): copy_out single-file mode writes at host_dst, not inside it#692
DorianZheng merged 2 commits into
boxlite-ai:mainfrom
G4614:fix/rest-copy-out-extract

Conversation

@G4614

@G4614 G4614 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

copy_out single-file mode writes a directory wrapping the file instead of the file at host_dst

Bug

box.copy_out("/guest/blob", "/host/blob_copy") over REST landed at the
wrong layout on the host:

expected:  /host/blob_copy           = the file
actual:    /host/blob_copy/blob      = the file inside a new directory

Local-FFI copy_out keeps single-file semantics (dest is the file path);
REST broke that contract.

Root cause: the SDK Rust REST client used tar::Archive::unpack(host_dst)
unconditionally. unpack always treats its arg as a destination directory
and extracts every entry under it. For a single-file archive that means
writing the entry as a child of host_dst rather than AT host_dst.

Fix

Detect single-file archives and write the entry directly at host_dst:

Archive shape Behaviour
1 regular file, no other entries write at host_dst
Multiple / dirs / symlinks / specials unpack(host_dst) as before

Mirror of #688's runner-side single-file detection — same situation,
opposite direction.

Test plan — two-sided verified

Pin: scripts/test/e2e/cases/test_files_io.py::test_copy_out_binary_roundtrips_sha256

Stack: local e2e. C lib + Python wheel rebuilt; no runner change needed (this is SDK-side only).

Pre-fix Post-fix
copy_out('/root/blob', '/host/blob_copy') then read_bytes('/host/blob_copy') IsADirectoryError: '/host/blob_copy' (host wrote a directory wrapping the file) bytes match guest sha256
copy_out('/root/dir/', '/host/dir/') (multi-entry archive) unchanged unchanged (falls through to archive.unpack)

Defensive: the single-file path is gated on file_count == 1 && other_count == 0, so any directory/symlink/link entry demotes to the multi-entry path. Zero-file archives go through the multi-entry path too (does nothing, same as before).

Summary by CodeRabbit

  • Bug Fixes
    • Tar extraction now preserves single-file archives by writing the single file directly to the intended destination (creating its parent directory if needed), while multi-entry or non-regular archives continue to unpack into a destination directory.

`box.copy_out("/guest/blob", "/host/blob_copy")` over REST left a
DIRECTORY at `/host/blob_copy/` containing the file as
`/host/blob_copy/blob`, instead of writing the file to
`/host/blob_copy` directly. The SDK called `archive.unpack(host_dst)`
unconditionally — `tar::Archive::unpack` always treats its argument
as a destination directory and writes entries relative to it.

The local-FFI copy_out preserves single-file semantics ("dest is the
file path"). Adapt the REST extraction to the same shape:

  - Probe the archive once. If exactly one regular file and no other
    entries, open `host_dst` as a file and stream the entry into it
    (creating any missing parent directories first).
  - Otherwise (multi-file, or any directory/symlink entry), fall back
    to the old `archive.unpack(host_dst)` path, which is correct for
    `box.copy_out("/guest/tree/", "/host/tree/")` shapes.

Mirror of the runner-side single-file detection in
`apps/runner/pkg/api/controllers/boxlite_files.go::extractTarToDir`
shipped in PR boxlite-ai#688 — same situation, opposite direction.

E2E regression test:
  sdks/python/tests/e2e/test_files_io.py::test_copy_out_binary_roundtrips_sha256
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a4561304-3fb0-4976-817e-3ede022fe528

📥 Commits

Reviewing files that changed from the base of the PR and between 9e5048c and 0568252.

📒 Files selected for processing (1)
  • src/boxlite/src/rest/litebox.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/boxlite/src/rest/litebox.rs

📝 Walkthrough

Walkthrough

The PR updates the extract_tar_to_path REST helper to preserve caller intent for single-file extraction. It probes the tar archive: if exactly one regular file exists with no other entries, that file's contents are written directly to host_dst; otherwise, host_dst is treated as a destination directory and the archive is unpacked normally.

Changes

Tar extraction single-file vs multi-entry handling

Layer / File(s) Summary
Updated extract_tar_to_path implementation
src/boxlite/src/rest/litebox.rs
The helper now probes tar archives to detect single-file layout, extracts matching archives directly to host_dst (after creating parent directories), and unpacks multi-entry or non-regular-entry archives into host_dst as a directory.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I probed the tar with a curious sniff,
Found one lone file tucked like a gift,
I made its parent, wrote bytes straight through,
For many entries, a directory I drew,
Hopping home happy — unpacked true! 📦✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: ensuring copy_out's single-file mode writes directly to host_dst rather than creating a directory wrapper.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@G4614 G4614 marked this pull request as ready for review June 9, 2026 12:10
@G4614 G4614 requested a review from a team as a code owner June 9, 2026 12:10
@G4614 G4614 enabled auto-merge June 9, 2026 12:10

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/boxlite/src/rest/litebox.rs (1)

924-927: ⚡ Quick win

Remove unused single_entry_path variable.

single_entry_path is captured during the probe pass but only referenced in a no-op let _ = statement. This is dead code that should be removed.

♻️ Proposed fix
-        let mut single_entry_path: Option<std::path::PathBuf> = None;
         for entry in probe.entries().map_err(|e| {
             BoxliteError::Internal(format!("failed to read tar archive: {}", e))
         })? {
             let entry = entry.map_err(|e| {
                 BoxliteError::Internal(format!("failed to read tar entry: {}", e))
             })?;
             let header = entry.header();
             match header.entry_type() {
                 tar::EntryType::Regular => {
                     file_count += 1;
-                    if single_entry_path.is_none() {
-                        single_entry_path =
-                            Some(entry.path().map(|c| c.into_owned()).unwrap_or_default());
-                    }
                 }
                 _ => other_count += 1,
             }
         }

And remove line 973:

-            // Probe said one regular file but the second pass found
-            // none — defensive fallthrough; shouldn't happen.
-            let _ = single_entry_path;

Also applies to: 973-973

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/boxlite/src/rest/litebox.rs` around lines 924 - 927, Remove the dead
`single_entry_path` capture and its related no-op usage: delete the
`single_entry_path` variable declaration/initialization and the block that
assigns `Some(entry.path()...)` (the code using `single_entry_path` during the
probe pass), and also remove the later `let _ = single_entry_path` no-op; update
any surrounding logic in the probe pass so behavior remains unchanged without
that unused variable (references: single_entry_path, entry.path()).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/boxlite/src/rest/litebox.rs`:
- Around line 914-919: The for-loop formatting around
probe.entries().map_err(...) and the subsequent entry.map_err(...) closures does
not match rustfmt style; reformat the loop to obey rustfmt (or run cargo fmt
--package boxlite) so the two map_err closures and their
BoxliteError::Internal(...) messages are split across lines consistently (ensure
the closure bodies are indented and wrapped, e.g., place the closure opening on
the next line and wrap the format! call), targeting the probe.entries() map_err
and the entry.map_err closures in litebox.rs so CI passes.

---

Nitpick comments:
In `@src/boxlite/src/rest/litebox.rs`:
- Around line 924-927: Remove the dead `single_entry_path` capture and its
related no-op usage: delete the `single_entry_path` variable
declaration/initialization and the block that assigns `Some(entry.path()...)`
(the code using `single_entry_path` during the probe pass), and also remove the
later `let _ = single_entry_path` no-op; update any surrounding logic in the
probe pass so behavior remains unchanged without that unused variable
(references: single_entry_path, entry.path()).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c74af55a-036a-4239-a849-4680ec55378d

📥 Commits

Reviewing files that changed from the base of the PR and between e4bbd0e and 9e5048c.

📒 Files selected for processing (1)
  • src/boxlite/src/rest/litebox.rs

Comment thread src/boxlite/src/rest/litebox.rs Outdated
DorianZheng pushed a commit that referenced this pull request Jun 9, 2026
…#688)

copy_in saves the uploaded tar as a binary blob instead of extracting

## Bug

`PUT /v1/{prefix}/boxes/{id}/files` (the runner handler for SDK
`copy_in`)
silently broke both single-file and directory uploads:

```
copy_in("host.txt", "/root/hello.txt")
  → /root/hello.txt = raw tar bytes (ustar header + payload), not the file
copy_in("host_dir/", "/root/flatdest/")
  → /root/flatdest/boxlite-upload-XXXXXX.tar = the directory as one big tar
```

Root cause: the SDK Rust REST client serialises the source as a tar
archive
in the body. The runner buffered it to `<tmp>/boxlite-upload-*.tar` and
called `Boxlite.CopyInto(tarPath, dest)`, which treats `src` as a
literal
filesystem path — so the tar archive itself was the "file" that landed
in
the guest.

## Fix

Extract the uploaded tar at the runner before calling `CopyInto`:

| Archive shape | Behaviour |
|---|---|
| 1 regular file, no other entries | write at `dest` (single-file
copy_in) |
| Multiple entries / dirs / symlinks / specials | `unpack` into `dest`
(directory copy_in) |
| Any entry with `..` in path | reject (zip-slip guard) |

## Test plan — two-sided verified

Pin: `scripts/test/e2e/cases/test_files_io.py`

Stack: local e2e. Runner binary swapped between main and PR builds.

| Case | Pre-fix (main) | Post-fix (this PR) |
|---|---|---|
| `test_copy_in_text_roundtrips_byte_exact` (host writes `"üñîçødé\n"`,
copy_in, guest cat) | `cat` returned raw tar archive text + ustar header
(`'hello.txt…ustar…<payload>'`) — file at expected path didn't exist |
**byte-exact payload** at `/root/hello.txt` |
| `test_copy_in_directory_include_parent_false` (copy_in dir with nested
files; `find -type f`) | only `/root/flatdest/boxlite-upload-XXXXXX.tar`
(the raw tar landed as one file) | **both `top.txt` and `sub/deep.txt`**
under `/root/flatdest/` |
| `test_copy_in_overwrite_false_rejects_conflict` | empty | empty —
runner-side extract always uses `O_TRUNC`. `overwrite=False` policy not
honoured on REST yet |
| `test_copy_out_binary_roundtrips_sha256` | `IsADirectoryError` on
host_dest | `IsADirectoryError` on host_dest — SDK Rust copy_out has the
mirror bug (raw tar saved as dir) |

This PR closes the two known-broken copy_in paths (single file +
directory). The two residual failures are different bugs:
- `overwrite=False` needs the tar-extract path to respect the option;
minor follow-up (5 lines in extract_tar).
- `copy_out` needs the SAME fix on the SDK side (the SDK currently saves
the runner's tar response without extracting). That's a Rust-SDK PR
(#692), scope of a separate change.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Improvements**
* Enhanced file upload handling to support efficient transfers of both
individual files and directory structures.

* **Tests**
* Added comprehensive end-to-end tests for file operations, verifying
data integrity, text preservation, binary round-tripping, recursive
directory handling, and overwrite behavior.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@DorianZheng DorianZheng disabled auto-merge June 9, 2026 13:47
@DorianZheng DorianZheng merged commit e3946ff into boxlite-ai:main Jun 9, 2026
29 of 30 checks passed
DorianZheng pushed a commit that referenced this pull request Jun 10, 2026
#710)

add 5 e2e cases pinning REST contracts not covered today

## Coverage gaps

Three classes of behaviour run through the SDK → API → Runner → VM
chain but had no e2e pin on the chain itself; bugs in any one of them
would silently regress through `make test:integration:*` (which only
exercises local-FFI):

```
1. CLI detach lifecycle — `boxlite run -d` returns, CLI process exits,
   does a fresh CLI process still see / exec the box?
   FFI side  src/boxlite/tests/detach.rs, recovery.rs  ✓
   E2E side  scripts/test/e2e/cases/                    ✗
2. Execution.attach / reattach contract — bogus exec_id should be a
   typed client error; reattach to a completed exec should return a
   usable handle.
   Runner side  apps/runner/.../boxlite_exec_attach_test.go  ✓
   SDK <-> API  end-to-end                                    ✗
3. host bind mount via REST — the cloud runtime intentionally dropped
   host bind mounts (#639); REST callers passing host paths must
   silently no-op, no /mnt/<x> in the guest.
   FFI surface  src/boxlite/tests/mount_security.rs  ✓
   REST contract                                      ✗
```

## Cases shipped

```
scripts/test/e2e/cases/test_cli_detach_recovery.py   (2 cases)
scripts/test/e2e/cases/test_exec_attach.py           (2 cases)
scripts/test/e2e/cases/test_volume_readonly.py       (1 case)
```

5 cases total. Author also dropped 6 cases from this branch's earlier
incarnation that were already committed alongside their respective
fix PRs (#686 / #688 / #689 / #691 / #692 / #696), so the diff is
strictly net-new coverage.

## Test plan — run against current main

Stack: local e2e, runner unchanged, no source edits.

| Case | Result | Notes |
|---|---|---|
| `test_detached_box_exec_propagates_exit_code_on_fresh_cli` | ✅ PASS |
exit-code passthrough across CLI processes |
| `test_detached_box_survives_cli_exit_and_is_reusable` | ⚠️ XFAIL
(strict) | reaches step 3 (`boxlite exec <id> echo still-alive`) then
hits the stdout-drop race that #563 fixes — marker drops when #563 lands
|
| `test_attach_with_bogus_id_is_typed_error` | ✅ PASS | bogus exec_id →
typed `Exception` (not 5xx, not silent) |
| `test_reattach_after_original_completes` | ⚠️ XFAIL (strict) | same
stdout-drop race (#563) on the original exec's `out=='first-output'`
assertion |
| `test_host_bind_mount_via_rest_is_silently_ignored` | ✅ PASS | the box
created with `volumes=[(host_dir, "/mnt/ro", True)]` reports
`MOUNT_LINE=<none>` from `/proc/mounts` and the host marker file is
untouched — REST silently dropped the host path |

The 2 XFAILs are tied to #563 (`fix(go-sdk): fold stream drain into
Execution.Wait`). Once #563 merges, both xfails flip xpass-strict, the
markers come off, and the suite is 5/5 green. No additional fix work is
needed in this PR.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Added end-to-end tests verifying CLI detach survival and box
reusability across fresh CLI processes, including exit code propagation
tests.
* Added end-to-end tests for SDK reattach functionality, validating
session state after execution completion.
* Added end-to-end test confirming proper host bind mount handling
during REST execution.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants