Skip to content

Fix race condition where concurrent sandbox exec overwrites storage#409

Merged
robinaugh merged 1 commit intomainfrom
jason/fix-concurrent-sandbox-storage-race
Mar 11, 2026
Merged

Fix race condition where concurrent sandbox exec overwrites storage#409
robinaugh merged 1 commit intomainfrom
jason/fix-concurrent-sandbox-storage-race

Conversation

@robinaugh
Copy link
Contributor

@robinaugh robinaugh commented Mar 10, 2026

Slack thread

I attempted to get an integration test wrapped around this race condition, but I could not reliably get it to fail without the code change. The unit tests have been red/green verified though, which is the best we can do with this particular race.

Summary

  • When two rwx sandbox exec commands run concurrently (e.g. Claude Code running rubocop and rspec in parallel), both find no existing session and each creates a new sandbox. The second SetSession call overwrites the first because they share the same storage key (cwd:branch:configFile), orphaning the first sandbox.
  • Fix by passing the already-held storage lock from ExecSandbox/ResetSandbox into StartSandbox, making the "check → create → persist" sequence atomic. The second process blocks until the first has persisted, then finds the existing session and reuses it.
  • Also fixes the same pattern in ResetSandbox where the lock was released before StartSandbox.

Test plan

  • New TestService_ExecSandbox_ConcurrentAutoCreate — two goroutines call ExecSandbox concurrently; asserts only one InitiateRun call occurs and both return the same RunID
  • New TestService_StartSandbox_StorageLock — verifies StartSandbox accepts a pre-held lock, persists the session, and releases the lock (including on error)
  • Red/green verified: concurrent test fails with expected: 1, actual: 2 without the fix
  • All existing sandbox tests pass

When two `rwx sandbox exec` commands run concurrently and both trigger
auto-create, the storage lock was released before calling StartSandbox,
allowing the second process to also find no session and create a
duplicate. The second write would overwrite the first session key,
orphaning the first sandbox (running remotely but invisible locally).

Fix by passing the already-held lock from ExecSandbox/ResetSandbox into
StartSandbox, so the "check if session exists → create → persist"
sequence is atomic. StartSandbox releases the lock after persisting the
initial session.
@robinaugh robinaugh self-assigned this Mar 10, 2026
@robinaugh robinaugh marked this pull request as ready for review March 10, 2026 21:51
@robinaugh robinaugh merged commit 4483514 into main Mar 11, 2026
1 check passed
@robinaugh robinaugh deleted the jason/fix-concurrent-sandbox-storage-race branch March 11, 2026 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants