Skip to content

Implement containers snapshot API#6337

Merged
gpanders merged 16 commits intomainfrom
ganders/CC-6863
Mar 19, 2026
Merged

Implement containers snapshot API#6337
gpanders merged 16 commits intomainfrom
ganders/CC-6863

Conversation

@gpanders
Copy link
Copy Markdown
Member

Implement the first draft of the containers snapshot API. It is experimental for now.

Local dev snapshots are implemented using Docker volumes and the Docker archive API.

@gpanders gpanders requested review from a team as code owners March 16, 2026 19:11
@gpanders gpanders marked this pull request as draft March 16, 2026 19:12
@gpanders gpanders force-pushed the ganders/CC-6863 branch 2 times, most recently from a788f4a to be86e28 Compare March 16, 2026 19:16
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR adds a containers snapshot API: snapshotDirectory() to create snapshots and snapshots option in start() to restore them, backed by Docker volumes and the Docker archive API for the local dev implementation.

Findings (ranked by severity):

  1. [MEDIUM] Snapshot restore path nesting — archive read from /mnt introduces extra directory level
  2. [LOW] dockerApiBinaryRequest reads entire response into memory unbounded before size check
  3. [LOW] Deserialized snapshot id is not validated before use in Docker volume names

These review comments were generated by an AI assistant and may not be perfect.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 16, 2026

I'm Bonk, and I've done a quick review of your PR.

This PR adds a containers snapshot API: snapshotDirectory() to create snapshots and snapshots option in start() to restore them, backed by Docker volumes and the Docker archive API for the local dev implementation.

I posted 3 inline review comments on PR #6337, ranked by severity:

  1. [MEDIUM] Snapshot restore path nesting (container-client.c++:1054) — The archive read from /mnt in the temp container includes a mnt/ prefix in the tar, which when extracted to the parent directory could introduce an extra directory level. Asked for confirmation that the round-trip works correctly.

  2. [LOW] Unbounded readAllBytes() (container-client.c++:577) — The binary response is fully buffered in memory before the 4 GiB size check runs.

  3. [LOW] Unvalidated snapshot id during restore (container-client.c++:1039) — Deserialized snapshot IDs are used in Docker volume names without format validation.

The overall design is sound — the serialization, compat flag gating, capnp schema additions, and test coverage all look good. The DirectorySnapshot JSG binding pattern with JSG_SERIALIZABLE, visitForMemoryInfo, and prototype properties follows existing conventions correctly.

github run

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 16, 2026

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 0% with 310 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.55%. Comparing base (580f4b0) to head (705de4d).
⚠️ Report is 155 commits behind head on main.

Files with missing lines Patch % Lines
src/workerd/server/container-client.c++ 0.00% 257 Missing ⚠️
src/workerd/api/container.c++ 0.00% 53 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6337      +/-   ##
==========================================
- Coverage   70.71%   70.55%   -0.16%     
==========================================
  Files         420      420              
  Lines      112752   113229     +477     
  Branches    18483    18570      +87     
==========================================
+ Hits        79729    79894     +165     
- Misses      22021    22297     +276     
- Partials    11002    11038      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gpanders
Copy link
Copy Markdown
Member Author

/bonk review

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings from detailed review (ranked by severity):

  1. [MEDIUM] dockerApiBinaryRequest default maxResponseSize = kj::maxValue — PUT response bodies read without explicit limit
  2. [MEDIUM] cleanupSnapshotVolumes constructs JSON filter without escaping prefix
  3. [LOW] Deserialized size not validated in DirectorySnapshot::deserialize — could yield a double that doesn't round-trip exactly for very large values
  4. [LOW] Unnecessary static on free function validateSnapshotDir (minor style)

Re: kentonv's JSG_STRUCT question — gpanders's reply is correct: JSG_SERIALIZABLE requires JSG_RESOURCE_TYPE (it references jsgSuper which only JSG_RESOURCE_TYPE defines). There are zero examples of JSG_STRUCT + JSG_SERIALIZABLE in the codebase, and the JSG README explicitly documents this constraint.

The overall design is sound: the snapshot create/restore round-trip via Docker volumes and the archive API is correct, the compat-flag gating is proper, the serialization format is versioned, and the cleanup logic (KJ_DEFER + volumeCommitted flag) handles partial failures well. The path nesting fix in the latest commit correctly addresses the earlier review finding.


These review comments were generated by an AI assistant and may not be perfect.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 17, 2026

Review posted on PR #6337. Here's a summary of what was posted:

Review body (top-level comment, review ID 3963586130):

  • 4 ranked findings with overall assessment that the design is sound

3 inline comments:

  1. [MEDIUM] container-client.c++:265dockerApiBinaryRequest has maxResponseSize = kj::maxValue default, making it easy to read unbounded Docker responses. Suggested changing the default to MAX_JSON_RESPONSE_SIZE (16 MiB) since callers needing the 1 GiB tar limit already pass it explicitly.

  2. [MEDIUM] container-client.c++:277cleanupSnapshotVolumes constructs the JSON filter by string interpolation without escaping. Safe in practice due to URI-encoding of containerName, but noted as a defensive coding concern.

  3. [LOW] container.c++:44 — Deserialized size not validated against the max limit or double precision boundary. Minor since legitimate values are bounded.

1 reply to kentonv's comment:

  • Confirmed gpanders's reply is correct: JSG_SERIALIZABLE requires JSG_RESOURCE_TYPE (uses jsgSuper typedef), cannot be used with JSG_STRUCT. Cited the JSG README, ser.h, and the 17 existing examples in the codebase.

github run

@gpanders gpanders marked this pull request as ready for review March 17, 2026 20:31
@gpanders gpanders requested a review from a team as a code owner March 17, 2026 20:48
@gpanders gpanders force-pushed the ganders/CC-6863 branch 3 times, most recently from 85da82b to a3cf897 Compare March 18, 2026 15:54
@gpanders
Copy link
Copy Markdown
Member Author

/bonk Review this PR again

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings from detailed review (ranked by severity):

  1. [MEDIUM] latest/ TypeScript types expose snapshots?: DirectorySnapshot[] on ContainerStartupOptions — the non-experimental TS override (lines 67-71 of container.h) explicitly excludes it, but the generated snapshot includes it
  2. [LOW] The experimental guard for snapshots in start() (lines 61-63 of container.c++) was flagged as redundant by gabivlj — gpanders acknowledged but hasn't removed it yet. However, the guard provides defense-in-depth since users can manually construct DirectorySnapshot objects (it's a JSG_STRUCT) without calling the experimental-gated snapshotDirectory().

The snapshot create/restore round-trip logic is correct: I traced paths like /app/data, /data, and /a/b/c/d through validateSnapshotDir, the archive root extraction, mount path computation, and the Docker archive API PUT/GET cycle — all produce the correct file layout. The KJ_DEFER cleanup pattern with volumeCommitted handles partial failures well, and the ID validation (hex + hyphens, max 64 chars) is solid.

Previous findings from my earlier reviews have been addressed: the maxResponseSize default was removed (callers now pass explicit limits), and JSON filter construction uses the capnp JSON codec instead of string interpolation.


These review comments were generated by an AI assistant and may not be perfect.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 18, 2026

Review posted on PR #6337. Here's a summary:

Review body (top-level comment):

  • 2 ranked findings with confirmation that the snapshot create/restore round-trip logic is correct and previous findings have been addressed

1 inline comment:

  1. [MEDIUM] types/generated-snapshot/latest/index.d.ts:3780 — The latest/ generated TypeScript types expose snapshots?: DirectorySnapshot[] on ContainerStartupOptions, but the non-experimental TS override explicitly excludes it. Since latest uses compat date 2999-12-31 (not experimental), the non-experimental branch should apply. hardTimeout has the same pre-existing issue. Users with latest types will see snapshots in autocomplete but get a runtime error if they use it without the experimental flag.

Key verification results:

  • The snapshot create/restore round-trip is correct for paths like /app/data, /data, and /a/b/c/d
  • The KJ_DEFER cleanup pattern with volumeCommitted correctly handles partial failures
  • No use-after-move or lifetime issues in the coroutine-based Docker API helpers
  • Previous review findings (default maxResponseSize, JSON string interpolation) have been addressed
  • Snapshot ID validation and validateSnapshotDir path traversal rejection are solid

github run

@gpanders
Copy link
Copy Markdown
Member Author

gpanders commented Mar 18, 2026

[LOW] The experimental guard for snapshots in start() (lines 61-63 of container.c++) was flagged as redundant by gabivlj — gpanders acknowledged but hasn't removed it yet. However, the guard provides defense-in-depth since users can manually construct DirectorySnapshot objects (it's a JSG_STRUCT) without calling the experimental-gated snapshotDirectory().

Incorrect, @gabivlj's comment was about the snapshotDirectory implementation, not start(). It is not redundant in start() which is why I left it. I removed it from snapshotDirectory() as Gabi recommended.

@gpanders
Copy link
Copy Markdown
Member Author

  1. [MEDIUM] types/generated-snapshot/latest/index.d.ts:3780 — The latest/ generated TypeScript types expose snapshots?: DirectorySnapshot[] on ContainerStartupOptions, but the non-experimental TS override explicitly excludes it. Since latest uses compat date 2999-12-31 (not experimental), the non-experimental branch should apply. hardTimeout has the same pre-existing issue. Users with latest types will see snapshots in autocomplete but get a runtime error if they use it without the experimental flag.

This is not correct, the type generator is launched with the experimental flag:

["serve", "--verbose", "--experimental", "--control-fd=3", configPath],

This is a pre-existing issue with the type generation pipeline, not a regression in this PR. hardTimeout has the same behavior for the same reason. If the type generation should distinguish latest (all compat flags, no experimental) from experimental (all compat flags + experimental), that would be a change to the build-types.ts pipeline, not to the container API code, which should happen in a separate PR.

Copy link
Copy Markdown
Member

@kentonv kentonv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving API, didn't carefully read implementation.

@gpanders
Copy link
Copy Markdown
Member Author

After talking with containers team internally (@IRCody, @spahl) we also need the ability to restore a directory snapshot to an arbitrary mount point (the implementation as-is always mount a snapshot to the same directory that was snapshotted).

So the StartParams will actually change to accepting an array of {mountPoint?: string; snapshot: ContainerDirectorySnapshot} objects, rather than an array of just ContainerDirectorySnapshots.

Copy link
Copy Markdown
Contributor

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the updates to the typings on behalf of the wrangler team.

@gpanders gpanders merged commit d05ea20 into main Mar 19, 2026
24 of 25 checks passed
@gpanders gpanders deleted the ganders/CC-6863 branch March 19, 2026 20:15
@danlapid
Copy link
Copy Markdown
Collaborator

@gpanders while confusing based on approvals, this PR wasn't properly reviewed.
As per comments:
Kenton only approved the API
Pete only approved the types.
Nobody actually reviewed the implementation.
Let's please revert and make sure the implementation itself gets approvals as well.

gpanders added a commit that referenced this pull request Mar 20, 2026
gpanders added a commit that referenced this pull request Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants