Fix Sandbox.create envs on snapshot/fork and stop sealing user envs#113
Merged
Conversation
Two interacting bugs in Sandbox.create({ envs }) that produced very
confusing behavior for users.
Bug #1 — snapshot fork dropped user envs.
createFromCheckpointCore re-bound only { Timeout int } from the request
body and forwarded originalCfg.Envs from cp.SandboxConfig, so calling
Sandbox.create({ snapshot, envs: { FOO: "x" } }) produced an empty $FOO
inside the guest. Fix: thread user envs through to the core via a
narrow userEnvs map[string]string parameter and merge them over
originalCfg.Envs (user keys win) after re-resolving the inherited
secret store. Scope is intentionally limited to envs — every other
field still inherits from the checkpoint.
Bug #2 — every env was sealed unconditionally.
secretsproxy.CreateSealedEnvs tokenized every entry of cfg.Envs, so
even user-supplied plaintext envs reached the guest as osb_sealed_…
tokens. echo $TEST_VAR returned the token, breaking every non-HTTP
use of the variable. Sealing is only meaningful for values sourced
from a SecretStore (so the MITM proxy can swap them on outbound
HTTPS). Fix: track which env names came from the store via a new
SealedEnvKeys []string on types.SandboxConfig (json:"-", never
persisted), populate it from resolveSecretStoreInto on both the
fresh-create and fork paths, plumb it through CreateSandboxRequest
(new field sealed_env_keys = 15) and the worker gRPC server, and
have CreateSealedEnvs only tokenize keys in that set. Non-sealed
envs pass through as plaintext; the proxy session is only registered
when there is something to substitute. On the fork path the seal-set
is computed before merging user envs so user keys are never sealed.
Worker deploy
The Azure dev box silently shipped without an OPENSANDBOX_S3_*
checkpoint store, which made every snapshot/fork RPC fail with
"checkpoint store not configured on this worker" — there was no
clear pointer at the missing config. Wire the worker to Azure Blob
via the existing OPENSANDBOX_S3_* env vars (the worker switches to
azureBlobClient when the endpoint contains .blob.core.windows.net,
see internal/storage/blob.go:39). Secrets are sourced from a
gitignored deploy/azure/.dev-env-secrets-<location> file and the
deploy now fails fast with a clear error if the checkpoint store
config isn't present.
Test coverage
New sdks/typescript/examples/test-snapshot-envs.ts asserts:
- plain Sandbox.create({ envs }) → guest sees plaintext
- Sandbox.create({ snapshot, envs }) → user envs survive the fork
AND remain plaintext
- Sandbox.create({ secretStore, envs }) → user envs are plaintext
while store-derived envs are still sealed (osb_sealed_…)
Registered in run-all-tests.ts so it runs in the production suite.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Replace the SealedEnvKeys side-channel introduced in the previous
commit with a real provenance-preserving field. Three bug-classes
collapse into "structurally impossible" instead of "fixed by careful
threading".
Why
---
The previous fix added a parallel []string of env-var names that the
API layer computed, the proto carried, and the worker re-hydrated into
a set, all to tell the secrets proxy "tokenize these keys, not those".
That worked but kept the underlying mistake intact: secret-store-derived
plaintext was still inlined into cfg.Envs alongside user envs, and the
provenance had to be reconstructed from a side-channel everywhere
downstream wanted it. As soon as that channel desynced from the
values themselves (as it did on the snapshot/fork path) you got
either silent drops or silent plaintext leaks.
What
----
New types.SandboxConfig.SecretEnvs map[string]string (json:"-").
resolveSecretStoreInto now writes decrypted values into SecretEnvs
and never touches Envs. The two maps remain disjoint end-to-end:
through cfgForPersistence (only SecretAllowedHosts needs scrubbing
now — SecretEnvs can never reach PG since it's json-tagged out),
through the gRPC proto (CreateSandboxRequest.sealed_env_keys = 15
becomes secret_envs = 15, a real map), through the worker, and into
secretsproxy.CreateSealedEnvs which now takes (plaintextEnvs, secretEnvs)
directly. Everything in secretEnvs is tokenized; everything in
plaintextEnvs is forwarded as-is; user-supplied keys win on collision.
createFromCheckpointCore no longer needs the "compute seal-set BEFORE
merging user envs" ordering trick, because the maps are independent —
the merge order is irrelevant.
The "secretStore + snapshot/image" combination is now rejected at
the API edge with a clear 400. The pre-existing inherit-only contract
("a fork inherits the snapshot's secret store and cannot override it")
was previously enforced implicitly by "the fork pipeline doesn't bind
SecretStore from the body", which silently dropped a user-provided
store on fork. The first fix in this PR turned that silent drop into
a silent leak (parent-resolved store-B plaintext smuggled through
cfg.Envs into the fork-time merge under names that weren't in the
seal-set). With the rejection in place users get an explicit error
instead, and even if they could bypass it the leak is structurally
impossible because secret values no longer travel via cfg.Envs.
Test coverage
-------------
test-snapshot-envs.ts grew a 4th case asserting the rejection. All
10 assertions pass on the redeployed dev box. test-secretstore.ts
(the existing 21-assertion lifecycle suite) also passes unchanged
against the refactored worker, confirming the user-facing SecretStore
behavior is preserved.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three interacting bugs in
Sandbox.create({ envs })made the API misbehave in increasingly bad ways the deeper you went into the snapshot/fork path. Fixed at the source by giving secret-store-derived envs their own field instead of mixing them into the user envs map.Bug #1 — snapshot fork dropped user envs
createFromCheckpointCorere-bound only{ Timeout int }from the request body and forwardedoriginalCfg.Envsfromcp.SandboxConfig, so:Fix: thread user envs into the core via a narrow
userEnvs map[string]stringparameter and merge them overoriginalCfg.Envs(user keys win) after the secret store is re-resolved. Scope is intentionally limited to envs — every other field still inherits from the checkpoint.Bug #2 — every env was sealed unconditionally
secretsproxy.CreateSealedEnvstokenized every entry ofcfg.Envs, so user-supplied plaintext envs reached the guest asosb_sealed_…tokens.echo \$TEST_VARreturned the token, breaking every non-HTTP use of the variable.Bug #3 —
secretStore+snapshotsilently leaked secretsThe first fix for #1 widened the merge surface. Combined with #2's existing flat-map design, the parent handler eagerly resolved a user-supplied secret store into
cfg.Envs, those values flowed through the newuserEnvsmerge intooriginalCfg.Envs, and the seal-set was computed against the snapshot's original store — so the smuggled values reached the guest unsealed. (Pre-PR, the same combo silently dropped the user's store rather than leaking it.)Fix
Rather than threading provenance through a side-channel
SealedEnvKeys []string, give store-derived envs their own field onSandboxConfigso the distinction is preserved end-to-end:resolveSecretStoreIntowrites decrypted values intocfg.SecretEnvs.cfg.Envsis never touched. Returns justerror— no side-channel.cfgForPersistencesimplified —SecretEnvsisjson:"-"so plaintext can never reach PG by construction. OnlySecretAllowedHostsfor store-derived names and the store-levelEgressAllowlistneed scrubbing.proto/worker/CreateSandboxRequestgets a realmap<string,string> secret_envs = 15(replacing the abandonedsealed_env_keysfrom the first attempt).secretsproxy.CreateSealedEnvs(... plaintextEnvs, secretEnvs map[string]string ...)takes the two maps directly. Everything insecretEnvsis tokenized, everything inplaintextEnvsis forwarded as-is, user-supplied keys win on collision. The proxy session is registered only when there's something to substitute.createFromCheckpointCoreno longer needs the "compute seal-set before merging user envs" ordering trick — the two maps are independent so merge order can't leak.secretStoreexplicitly with a 400 (matches the existing inherit-only contract; leak path is closed structurally regardless).After this refactor, the bug class is unrepresentable: nothing in the code path can put store-derived plaintext into
cfg.Envs, and nothing tells the worker which keys to seal via a side channel — the seal set iskeys(cfg.SecretEnvs).Worker deploy
Tangentially: the Azure dev box silently shipped without an
OPENSANDBOX_S3_*checkpoint store, so every snapshot/fork RPC failed withcheckpoint store not configured on this workerand the cause was invisible. Wired the worker to Azure Blob via the existingOPENSANDBOX_S3_*env vars (internal/storage/blob.go:39switches toazureBlobClientwhen the endpoint contains.blob.core.windows.net). Real values come from a gitignoreddeploy/azure/.dev-env-secrets-<location>file; the deploy now fails fast with a clear error if the config is missing. A.exampletemplate is committed.Test plan
New
sdks/typescript/examples/test-snapshot-envs.ts(registered inrun-all-tests.tsafterCheckpoints):Sandbox.create({ envs })→ guest sees plaintextSandbox.create({ snapshot, envs })→ user envs survive the fork and remain plaintextSandbox.create({ secretStore, envs })→ user envs are plaintext while store-derived envs are still sealed (osb_sealed_…)Sandbox.create({ secretStore, snapshot })→ API rejects with 400, no leak10/10 assertions pass on the redeployed dev box. The existing
test-secretstore.tslifecycle suite (21 assertions) also passes unchanged against the refactored worker, confirming user-facingSecretStorebehavior is preserved.🤖 Generated with Claude Code