Conversation
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
Adds an experimental snapshotDirectory API for containers backed by Docker volumes, including snapshot create, restore (with custom mount points), and stale-volume cleanup.
Findings (highest severity first):
- [MEDIUM] Dangling
kj::StringPtrin deferreddeleteDockerVolumecleanup —container-client.c++:1421 - [MEDIUM]
latestTypeScript types expose experimental-only snapshot fields and types —types/generated-snapshot/latest/index.d.ts:3785 - [LOW] Global mutable bool
staleSnapshotVolumeCheckScheduledwithout thread-safety annotation —container-client.c++:48
These comments were generated by an AI code review assistant and may contain inaccuracies.
|
Review posted successfully on PR #6376. Here's a summary of the findings: 3 issues found, ranked by severity:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6376 +/- ##
==========================================
- Coverage 70.81% 70.64% -0.18%
==========================================
Files 425 425
Lines 116296 116586 +290
Branches 18773 18829 +56
==========================================
+ Hits 82357 82364 +7
- Misses 22707 22997 +290
+ Partials 11232 11225 -7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Merging this PR will not alter performance
Comparing Footnotes
|
|
CC @anonrig for review |
|
/bonk review |
There was a problem hiding this comment.
3 findings ranked by severity:
- [MEDIUM]
normalizePathperforms O(n) allocations for n trailing slashes —container-client.c++:51-56 - [LOW] Snapshot ID validation allows all-hyphens, which produces a Docker volume named
workerd-snap----—container-client.c++:1258-1263 - [LOW]
normalizePathordering withvalidateAbsolutePathinsnapshotDirectory—container-client.c++:1388-1389
Previous review findings from PR #6337 and the prior review on this PR have been addressed:
- The dangling
kj::StringPtrinKJ_DEFER(finding #1 from ask-bonk) is already fixed — line 1421 passeskj::str(volumeName)(owned copy). - The
staleSnapshotVolumeCheckScheduledglobal is alreadystd::atomic_bool(line 48). - The
latesttypes issue is a pre-existing pipeline problem (author confirmed), not a regression.
Overall: the design is sound. The capnp schema additions, compat-flag gating, path validation, snapshot ID validation during restore, and test coverage (round-trip, named, multi-directory, custom mount point, root restore, cross-DO, non-existent directory) are thorough. The KJ_DEFER + volumeCommitted cleanup pattern correctly handles partial failures.
These comments were generated by an AI code review assistant and may contain inaccuracies.
|
Review posted on PR #6376. Here's a summary: Review body (top-level comment, review ID 3983426108):
3 inline comments:
Key verification results:
|
|
The generated output of |
971c76e to
7c84356
Compare
Expose a new experimental `snapshotDirectory` API for containers and include an implementation using Docker volumes (used by Wrangler for local dev). This is not yet implemented in production for Cloudflare Workers, but lays the groundwork for that feature.
Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com>
7c84356 to
6fad9a1
Compare
|
Rebased on main and resolved conflicts. |
* Use .exchange() to update atomic boolean * Use consistent ordering for normalize -> validate code paths * Add tests for snapshotting a stopped container and restoring a container with a non-existent snapshot ID
|
/bigbonk roast this pr |
Co-authored-by: anonrig <anonrig@users.noreply.github.com>
|
Here's a summary of what was done: Addressed anonrig's review feedback on
|
When a volume is bind mounted Docker will create it if it doesn't exist. However we don't want that behavior for snapshots: a snapshot ID for a snapshot that doesn't exist should result in an error, not silently create a new volume. This bug also exposed a second bug which is that if the start() code path fails for any reason, the container remains in a zombie state at which point monitor() hangs forever. If any of the dependencies on the start() path fail then we destroy the container, and we also require the container to be in a started state when monitor() is called.
7ab62c9 to
4c2ce51
Compare
Expose a new experimental
snapshotDirectoryAPI for containers and include an implementation using Docker volumes (used by Wrangler for local dev).This is not yet implemented in production for Cloudflare Workers, but lays the groundwork for that feature.
This PR is (as of now) identical in content to #6337 to make reviewing easier for those who reviewed the last PR. I'll address any review feedback in separate commits to keep things separated.
Implementation notes
The local dev implementation of snapshots uses a combination of Docker volumes and the Docker archive API. We use Docker volumes to manage and store the snapshots on disk (this is mostly a UX convenience for the user, so that workerd created snapshots are visible in
docker volume lsand can be easily removed withdocker volume prune), but we do not bind-mount the volumes when restoring snapshots, since Docker volumes do not have the same properties as snapshots (namely, snapshots should be immutable).Instead when creating or restoring a snapshot we create a temporary container (using the same image as the "main" container) with the volume attached and then copy between the containers using the archive API.
NOTE: This is known to perform pretty badly on macOS since each of the archive API calls sends the entire snapshot tar contents over the Docker UNIX socket, which traverses the VM<->host boundary on macOS, so has a lot of overhead. There may be some ways to improve this but for now the plan is to defer those performance improvements to a follow up PR so that we can at least land the functional implementation sooner.