Improve snapshot restore performance and semantics#6406
Conversation
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This PR replaces the old tar-based snapshot restore (which round-tripped data through the Docker archive API) with a volume-clone + NoCopy mount approach. It also rejects restoring to / and refactors container removal into a shared helper.
-
[LOW]
removeContainerusesKJ_REQUIREwhere the old code usedJSG_REQUIRE— the refactoredremoveContainer()free function usesKJ_REQUIREfor status code checks. The olddestroyContainer()anddestroySidecarContainer()usedJSG_REQUIRE, which produces a JS-visibleError. SinceremoveContaineris now called fromdestroyContainer()(user-initiated viadestroy()RPC), errors will surface as rawkj::Exceptioninstead ofJSG_REQUIREErrortypes. Per codebase convention, error type changes are generally not breaking, and the.catch_()wrappers swallow these in cleanup paths anyway, so this is low severity — just flagging for awareness. -
[LOW] Test name suggests sorting logic that doesn't exist —
testSnapshotOverlappingMountsSortedByDepthimplies workerd sorts mounts by path depth, but no sorting is implemented. Docker itself handles overlapping volume mounts correctly regardless of array order. The test passes because Docker's mount semantics are order-independent, not because of any depth-sorting logic. Consider renaming to avoid implying behavior that isn't there.
These comments were generated by an AI code review assistant and may not be perfect.
|
Review posted on PR #6406. Two low-severity findings:
The overall change is solid: the cap'n proto schema addition is wire-compatible ( |
This will not be allowed in the production implementation and is also disallowed by Docker when bind mounting volumes, so we add the same constraint for consistency.
This gives the exact same semantics that snapshot restoring will have in production when it is implemented: restoring a snapshot to a mount point masks the current contents of the mount point (Docker volumes do this) and snapshots should be immutable (this is why we clone the volume instead of just mounting it). This is an improvement over the current archive API implementation, which does not mask the mount point contents on restore and also has very poor performance on macOS (caused by the snapshot contents having to traverse the Docker API socket multiple times).
33a1139 to
6523ec4
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6406 +/- ##
==========================================
- Coverage 70.70% 70.66% -0.05%
==========================================
Files 427 427
Lines 117092 117188 +96
Branches 18889 18901 +12
==========================================
+ Hits 82794 82812 +18
- Misses 23049 23119 +70
- Partials 11249 11257 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This is a follow up to #6376
The changes are separated into 2 commits:
Disallow restoring directory snapshots to the root directory
This will not be allowed in the production implementation and is also disallowed by Docker when bind mounting volumes, so we add the same constraint for consistency.
Clone snapshot volumes when restoring a snapshot
This gives the exact same semantics that snapshot restoring will have in production when it is implemented: restoring a snapshot to a mount point masks the current contents of the mount point (Docker volumes do this) and snapshots should be immutable (this is why we clone the volume instead of just mounting it).
This is an improvement over the current archive API implementation, which does not mask the mount point contents on restore and also has very poor performance on macOS (caused by the snapshot contents having to traverse the Docker API socket multiple times).
Performance wise, this brings the time to restore a 600 MB snapshot on macOS down from ~30s to ~4s