Skip to content
GitHub

Sandbox Provider Leaderboard

Sandbox Benchmarks

A leaderboard of common benchmarks for each of our sandbox providers.

Last run: April 1, 2026
ArchilLatitudeBrowserbaseTigris

Performance Over Time

Composite Score

Detailed Metrics

Provider
Score
Median
P95
P99
Success
Daytona98.20.11s0.28s0.29s100%
E2B93.80.44s0.85s0.99s100%
Hopx87.71.05s1.42s1.64s100%
Blaxel89.21.05s1.12s1.15s100%
Cloudflare79.31.72s2.48s2.78s100%
Vercel81.71.75s1.93s1.98s100%
Namespace77.81.86s2.43s3.35s100%
Runloop80.81.87s1.99s1.99s100%
CodeSandbox75.52.32s2.60s2.72s100%
Modal43.22.66s11.78s22.39s98%

Want to see a provider added?

Let us know on X

Methodology

What We Measure

Every benchmark measures Time to Interactive (TTI) — the elapsed time from calling compute.sandbox.create() to the first successful runCommand() inside the sandbox.

Each provider is tested with 100 iterations per run. Benchmarks run automatically via GitHub Actions on a recurring schedule. All results are committed to the public benchmarks repo.

Sequential Test: Sandboxes are launched one at a time, waiting for each to become interactive before starting the next.

Staggered Test: Sandboxes are launched with 200ms delays between each.

Burst Test: All sandboxes are launched concurrently in a single burst.

How We Score

The Composite Score is a weighted blend of timing metrics multiplied by the success rate. Each metric is scored against a fixed 10-second ceiling: 100 × (1 − value / 10,000ms), so a 200ms median scores 98 and anything ≥10s scores 0.

The weighted timing score is then multiplied by the success rate (0–1), so providers that fail frequently are penalized proportionally.

  • Median: 60% — primary signal for typical experience
  • P95: 25% — tail latency / consistency
  • P99: 15% — extreme tail latency

Sandbox Benchmarks FAQs

Have another question? Email us.

A sandbox is anywhere you can run code in isolation. It could be a VM, bare metal, a container, anywhere with compute resources.