Sandbox Provider Leaderboard
Sandbox Benchmarks
A leaderboard of common benchmarks for each of our sandbox providers.
Performance Over Time
Composite Score
Detailed Metrics
Provider | Score | Median | P95 | P99 | Success |
|---|---|---|---|---|---|
| Daytona | 98.2 | 0.11s | 0.28s | 0.29s | 100% |
| E2B | 93.8 | 0.44s | 0.85s | 0.99s | 100% |
| Hopx | 87.7 | 1.05s | 1.42s | 1.64s | 100% |
| Blaxel | 89.2 | 1.05s | 1.12s | 1.15s | 100% |
| Cloudflare | 79.3 | 1.72s | 2.48s | 2.78s | 100% |
| Vercel | 81.7 | 1.75s | 1.93s | 1.98s | 100% |
| Namespace | 77.8 | 1.86s | 2.43s | 3.35s | 100% |
| Runloop | 80.8 | 1.87s | 1.99s | 1.99s | 100% |
| CodeSandbox | 75.5 | 2.32s | 2.60s | 2.72s | 100% |
| Modal | 43.2 | 2.66s | 11.78s | 22.39s | 98% |
Want to see a provider added?
Methodology
What We Measure
Every benchmark measures Time to Interactive (TTI) — the elapsed time from calling compute.sandbox.create() to the first successful runCommand() inside the sandbox.
Each provider is tested with 100 iterations per run. Benchmarks run automatically via GitHub Actions on a recurring schedule. All results are committed to the public benchmarks repo.
Sequential Test: Sandboxes are launched one at a time, waiting for each to become interactive before starting the next.
Staggered Test: Sandboxes are launched with 200ms delays between each.
Burst Test: All sandboxes are launched concurrently in a single burst.
How We Score
The Composite Score is a weighted blend of timing metrics multiplied by the success rate. Each metric is scored against a fixed 10-second ceiling: 100 × (1 − value / 10,000ms), so a 200ms median scores 98 and anything ≥10s scores 0.
The weighted timing score is then multiplied by the success rate (0–1), so providers that fail frequently are penalized proportionally.
- • Median: 60% — primary signal for typical experience
- • P95: 25% — tail latency / consistency
- • P99: 15% — extreme tail latency
Sandbox Benchmarks FAQs
Have another question? Email us.