Skip to content

Fix Sandbox.create envs on snapshot/fork and stop sealing user envs#113

Merged
motatoes merged 2 commits into
mainfrom
fix/snapshot-envs-and-selective-sealing
Apr 8, 2026
Merged

Fix Sandbox.create envs on snapshot/fork and stop sealing user envs#113
motatoes merged 2 commits into
mainfrom
fix/snapshot-envs-and-selective-sealing

Conversation

@motatoes

@motatoes motatoes commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Three interacting bugs in Sandbox.create({ envs }) made the API misbehave in increasingly bad ways the deeper you went into the snapshot/fork path. Fixed at the source by giving secret-store-derived envs their own field instead of mixing them into the user envs map.

Bug #1 — snapshot fork dropped user envs

createFromCheckpointCore re-bound only { Timeout int } from the request body and forwarded originalCfg.Envs from cp.SandboxConfig, so:

await Sandbox.create({ snapshot, envs: { TEST_VAR: "hello" } });
// → echo \$TEST_VAR returned ""

Fix: thread user envs into the core via a narrow userEnvs map[string]string parameter and merge them over originalCfg.Envs (user keys win) after the secret store is re-resolved. Scope is intentionally limited to envs — every other field still inherits from the checkpoint.

Bug #2 — every env was sealed unconditionally

secretsproxy.CreateSealedEnvs tokenized every entry of cfg.Envs, so user-supplied plaintext envs reached the guest as osb_sealed_… tokens. echo \$TEST_VAR returned the token, breaking every non-HTTP use of the variable.

Bug #3secretStore + snapshot silently leaked secrets

The first fix for #1 widened the merge surface. Combined with #2's existing flat-map design, the parent handler eagerly resolved a user-supplied secret store into cfg.Envs, those values flowed through the new userEnvs merge into originalCfg.Envs, and the seal-set was computed against the snapshot's original store — so the smuggled values reached the guest unsealed. (Pre-PR, the same combo silently dropped the user's store rather than leaking it.)

Fix

Rather than threading provenance through a side-channel SealedEnvKeys []string, give store-derived envs their own field on SandboxConfig so the distinction is preserved end-to-end:

type SandboxConfig struct {
    Envs       map[string]string `json:"envs,omitempty"`
    SecretEnvs map[string]string `json:"-"` // never persisted, never set by SDK
    // ...
}
  • resolveSecretStoreInto writes decrypted values into cfg.SecretEnvs. cfg.Envs is never touched. Returns just error — no side-channel.
  • cfgForPersistence simplified — SecretEnvs is json:"-" so plaintext can never reach PG by construction. Only SecretAllowedHosts for store-derived names and the store-level EgressAllowlist need scrubbing.
  • proto/worker/CreateSandboxRequest gets a real map<string,string> secret_envs = 15 (replacing the abandoned sealed_env_keys from the first attempt).
  • secretsproxy.CreateSealedEnvs(... plaintextEnvs, secretEnvs map[string]string ...) takes the two maps directly. Everything in secretEnvs is tokenized, everything in plaintextEnvs is forwarded as-is, user-supplied keys win on collision. The proxy session is registered only when there's something to substitute.
  • createFromCheckpointCore no longer needs the "compute seal-set before merging user envs" ordering trick — the two maps are independent so merge order can't leak.
  • Snapshot/image branch rejects secretStore explicitly with a 400 (matches the existing inherit-only contract; leak path is closed structurally regardless).

After this refactor, the bug class is unrepresentable: nothing in the code path can put store-derived plaintext into cfg.Envs, and nothing tells the worker which keys to seal via a side channel — the seal set is keys(cfg.SecretEnvs).

Worker deploy

Tangentially: the Azure dev box silently shipped without an OPENSANDBOX_S3_* checkpoint store, so every snapshot/fork RPC failed with checkpoint store not configured on this worker and the cause was invisible. Wired the worker to Azure Blob via the existing OPENSANDBOX_S3_* env vars (internal/storage/blob.go:39 switches to azureBlobClient when the endpoint contains .blob.core.windows.net). Real values come from a gitignored deploy/azure/.dev-env-secrets-<location> file; the deploy now fails fast with a clear error if the config is missing. A .example template is committed.

Test plan

New sdks/typescript/examples/test-snapshot-envs.ts (registered in run-all-tests.ts after Checkpoints):

  • plain Sandbox.create({ envs }) → guest sees plaintext
  • Sandbox.create({ snapshot, envs }) → user envs survive the fork and remain plaintext
  • Sandbox.create({ secretStore, envs }) → user envs are plaintext while store-derived envs are still sealed (osb_sealed_…)
  • Sandbox.create({ secretStore, snapshot }) → API rejects with 400, no leak

10/10 assertions pass on the redeployed dev box. The existing test-secretstore.ts lifecycle suite (21 assertions) also passes unchanged against the refactored worker, confirming user-facing SecretStore behavior is preserved.

🤖 Generated with Claude Code

Two interacting bugs in Sandbox.create({ envs }) that produced very
confusing behavior for users.

Bug #1 — snapshot fork dropped user envs.
createFromCheckpointCore re-bound only { Timeout int } from the request
body and forwarded originalCfg.Envs from cp.SandboxConfig, so calling
Sandbox.create({ snapshot, envs: { FOO: "x" } }) produced an empty $FOO
inside the guest. Fix: thread user envs through to the core via a
narrow userEnvs map[string]string parameter and merge them over
originalCfg.Envs (user keys win) after re-resolving the inherited
secret store. Scope is intentionally limited to envs — every other
field still inherits from the checkpoint.

Bug #2 — every env was sealed unconditionally.
secretsproxy.CreateSealedEnvs tokenized every entry of cfg.Envs, so
even user-supplied plaintext envs reached the guest as osb_sealed_…
tokens. echo $TEST_VAR returned the token, breaking every non-HTTP
use of the variable. Sealing is only meaningful for values sourced
from a SecretStore (so the MITM proxy can swap them on outbound
HTTPS). Fix: track which env names came from the store via a new
SealedEnvKeys []string on types.SandboxConfig (json:"-", never
persisted), populate it from resolveSecretStoreInto on both the
fresh-create and fork paths, plumb it through CreateSandboxRequest
(new field sealed_env_keys = 15) and the worker gRPC server, and
have CreateSealedEnvs only tokenize keys in that set. Non-sealed
envs pass through as plaintext; the proxy session is only registered
when there is something to substitute. On the fork path the seal-set
is computed before merging user envs so user keys are never sealed.

Worker deploy

The Azure dev box silently shipped without an OPENSANDBOX_S3_*
checkpoint store, which made every snapshot/fork RPC fail with
"checkpoint store not configured on this worker" — there was no
clear pointer at the missing config. Wire the worker to Azure Blob
via the existing OPENSANDBOX_S3_* env vars (the worker switches to
azureBlobClient when the endpoint contains .blob.core.windows.net,
see internal/storage/blob.go:39). Secrets are sourced from a
gitignored deploy/azure/.dev-env-secrets-<location> file and the
deploy now fails fast with a clear error if the checkpoint store
config isn't present.

Test coverage

New sdks/typescript/examples/test-snapshot-envs.ts asserts:
  - plain Sandbox.create({ envs }) → guest sees plaintext
  - Sandbox.create({ snapshot, envs }) → user envs survive the fork
    AND remain plaintext
  - Sandbox.create({ secretStore, envs }) → user envs are plaintext
    while store-derived envs are still sealed (osb_sealed_…)

Registered in run-all-tests.ts so it runs in the production suite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Apr 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
opensandbox Ready Ready Preview, Comment Apr 8, 2026 7:14pm

Request Review

Replace the SealedEnvKeys side-channel introduced in the previous
commit with a real provenance-preserving field. Three bug-classes
collapse into "structurally impossible" instead of "fixed by careful
threading".

Why
---

The previous fix added a parallel []string of env-var names that the
API layer computed, the proto carried, and the worker re-hydrated into
a set, all to tell the secrets proxy "tokenize these keys, not those".
That worked but kept the underlying mistake intact: secret-store-derived
plaintext was still inlined into cfg.Envs alongside user envs, and the
provenance had to be reconstructed from a side-channel everywhere
downstream wanted it. As soon as that channel desynced from the
values themselves (as it did on the snapshot/fork path) you got
either silent drops or silent plaintext leaks.

What
----

New types.SandboxConfig.SecretEnvs map[string]string (json:"-").
resolveSecretStoreInto now writes decrypted values into SecretEnvs
and never touches Envs. The two maps remain disjoint end-to-end:
through cfgForPersistence (only SecretAllowedHosts needs scrubbing
now — SecretEnvs can never reach PG since it's json-tagged out),
through the gRPC proto (CreateSandboxRequest.sealed_env_keys = 15
becomes secret_envs = 15, a real map), through the worker, and into
secretsproxy.CreateSealedEnvs which now takes (plaintextEnvs, secretEnvs)
directly. Everything in secretEnvs is tokenized; everything in
plaintextEnvs is forwarded as-is; user-supplied keys win on collision.

createFromCheckpointCore no longer needs the "compute seal-set BEFORE
merging user envs" ordering trick, because the maps are independent —
the merge order is irrelevant.

The "secretStore + snapshot/image" combination is now rejected at
the API edge with a clear 400. The pre-existing inherit-only contract
("a fork inherits the snapshot's secret store and cannot override it")
was previously enforced implicitly by "the fork pipeline doesn't bind
SecretStore from the body", which silently dropped a user-provided
store on fork. The first fix in this PR turned that silent drop into
a silent leak (parent-resolved store-B plaintext smuggled through
cfg.Envs into the fork-time merge under names that weren't in the
seal-set). With the rejection in place users get an explicit error
instead, and even if they could bypass it the leak is structurally
impossible because secret values no longer travel via cfg.Envs.

Test coverage
-------------

test-snapshot-envs.ts grew a 4th case asserting the rejection. All
10 assertions pass on the redeployed dev box. test-secretstore.ts
(the existing 21-assertion lifecycle suite) also passes unchanged
against the refactored worker, confirming the user-facing SecretStore
behavior is preserved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@breardon2011 breardon2011 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@motatoes motatoes merged commit 16fd3fe into main Apr 8, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants