Skip to content

Improve snapshot restore performance and semantics#6406

Merged
gpanders merged 2 commits intomainfrom
ganders/CC-7284
Mar 25, 2026
Merged

Improve snapshot restore performance and semantics#6406
gpanders merged 2 commits intomainfrom
ganders/CC-7284

Conversation

@gpanders
Copy link
Copy Markdown
Member

@gpanders gpanders commented Mar 24, 2026

This is a follow up to #6376

The changes are separated into 2 commits:

  • Disallow restoring directory snapshots to the root directory

    This will not be allowed in the production implementation and is also disallowed by Docker when bind mounting volumes, so we add the same constraint for consistency.

  • Clone snapshot volumes when restoring a snapshot

    This gives the exact same semantics that snapshot restoring will have in production when it is implemented: restoring a snapshot to a mount point masks the current contents of the mount point (Docker volumes do this) and snapshots should be immutable (this is why we clone the volume instead of just mounting it).

    This is an improvement over the current archive API implementation, which does not mask the mount point contents on restore and also has very poor performance on macOS (caused by the snapshot contents having to traverse the Docker API socket multiple times).


Performance wise, this brings the time to restore a 600 MB snapshot on macOS down from ~30s to ~4s

@gpanders gpanders requested review from a team as code owners March 24, 2026 21:59
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR replaces the old tar-based snapshot restore (which round-tripped data through the Docker archive API) with a volume-clone + NoCopy mount approach. It also rejects restoring to / and refactors container removal into a shared helper.

  1. [LOW] removeContainer uses KJ_REQUIRE where the old code used JSG_REQUIRE — the refactored removeContainer() free function uses KJ_REQUIRE for status code checks. The old destroyContainer() and destroySidecarContainer() used JSG_REQUIRE, which produces a JS-visible Error. Since removeContainer is now called from destroyContainer() (user-initiated via destroy() RPC), errors will surface as raw kj::Exception instead of JSG_REQUIRE Error types. Per codebase convention, error type changes are generally not breaking, and the .catch_() wrappers swallow these in cleanup paths anyway, so this is low severity — just flagging for awareness.

  2. [LOW] Test name suggests sorting logic that doesn't existtestSnapshotOverlappingMountsSortedByDepth implies workerd sorts mounts by path depth, but no sorting is implemented. Docker itself handles overlapping volume mounts correctly regardless of array order. The test passes because Docker's mount semantics are order-independent, not because of any depth-sorting logic. Consider renaming to avoid implying behavior that isn't there.


These comments were generated by an AI code review assistant and may not be perfect.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 24, 2026

Review posted on PR #6406. Two low-severity findings:

  1. removeContainer uses KJ_REQUIRE where the old code used JSG_REQUIRE — changes the exception type surfaced through user-facing RPC paths like destroy(). Not breaking per codebase convention, but noted for awareness.

  2. Test name testSnapshotOverlappingMountsSortedByDepth implies sorting logic that doesn't exist — no depth-based sorting is implemented; Docker handles overlapping mounts correctly regardless of order. The test name/comments are misleading.

The overall change is solid: the cap'n proto schema addition is wire-compatible (mounts @51), RAII cleanup in cloneSnapshot is correct (deferred volume deletion + helper container cleanup), the destructor's &network reference capture is safe (refers to the external kj::Network object which outlives the cleanup promise), and the new validation in validateRestorePath correctly rejects root restore at the API boundary.

github run

This will not be allowed in the production implementation and is also
disallowed by Docker when bind mounting volumes, so we add the same
constraint for consistency.
This gives the exact same semantics that snapshot restoring will have in
production when it is implemented: restoring a snapshot to a mount point
masks the current contents of the mount point (Docker volumes do this)
and snapshots should be immutable (this is why we clone the volume
instead of just mounting it).

This is an improvement over the current archive API implementation,
which does not mask the mount point contents on restore and also has
very poor performance on macOS (caused by the snapshot contents having
to traverse the Docker API socket multiple times).
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 8.25688% with 200 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.66%. Comparing base (5d002a6) to head (6523ec4).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/workerd/server/container-client.c++ 0.00% 172 Missing ⚠️
src/workerd/api/container.c++ 0.00% 20 Missing ⚠️
src/workerd/server/container-client-test.c++ 69.23% 0 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6406      +/-   ##
==========================================
- Coverage   70.70%   70.66%   -0.05%     
==========================================
  Files         427      427              
  Lines      117092   117188      +96     
  Branches    18889    18901      +12     
==========================================
+ Hits        82794    82812      +18     
- Misses      23049    23119      +70     
- Partials    11249    11257       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gpanders gpanders merged commit 384cb7d into main Mar 25, 2026
31 of 32 checks passed
@gpanders gpanders deleted the ganders/CC-7284 branch March 25, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants