Free on-call practice for SREs and DevOps engineers

The Production Incident Simulator
for On-Call Engineers

Hands-on production incident simulations for SREs, DevOps engineers, and technical founders.

Drop into a realistic terminal with a ticking clock and a system on fire. Run commands, find the root cause, and fix it before time runs out. No setup required.

Start Playing FreeFree account - 10 seconds with GitHub or Google

Runs in your browser. Takes 5 minutes. The only production incident simulator you can start in seconds.

418+ incidents simulated

It's Cyber Monday. Payments just stopped. You're on call.

Live 3D architecture, real-time logs, real commands, and a ticking clock. Can you fix it before the money runs out?

Play the Situation Room

The Situation Room - interactive 3D war room with live architecture, real commands, and a ticking clock

Trending685K+ views - covered by Tom's Hardware, Hacker News, and more

An AI agent ran terraform destroy on production.
2.5 years of data - gone. Can you recover it?

Based on the real DataTalksClub incident that hit the front page of Hacker News. Play through an authentic Claude Code split-panel interface - the same kind of setup that caused the original disaster.

Sign Up to PlayFree account required - 10 seconds with GitHub

What the DevOps community said when this happened in real life

>>
Now playable inside Claude and ChatGPTNew

Connect your AI assistant and solve production incidents right from the chat. Full scoring, XP, and badges sync to your profile.

Not a Tutorial. Not a Quiz.
A Real-Time Incident Simulation.

10+ scenarios based on incidents that actually took down production. New ones added every 2 weeks.

Real Terminal

kubectl, logs, db queries - actual commands

Live Metrics

Streaming logs, error spikes, gauges

Ticking Clock

Revenue drops, PagerDuty fires, pressure builds

Full Debrief

Root cause, optimal path, what you missed

Play More. Unlock More.

Every scenario you complete earns XP. Hit milestones to unlock pro scenarios for free.

0XP
0 XP
300 XP
600 XP
1000 XP
Free
Start Free
The Mysterious Timeout
The Expired Certificate
The AI That Ate Production
300 XP
Unlock 1 Scenario
Choose an intermediate scenario
600 XP
Unlock 2 Scenarios
Choose another scenario
1000 XP
Unlock 3 Scenarios
Choose an advanced scenario

Or skip the grind - Pro unlocks all scenarios instantly.

What You Get - Free

Sign up with GitHub or Google - takes 10 seconds

  • The viral Terraform scenario everyone is talking about (685K+ views)
  • 10+ incidents across databases, Kubernetes, cloud, and security (growing)
  • Leaderboard ranking against other engineers
  • Score breakdown and solution walkthroughs

Your first production incident shouldn't be your worst one.

Most engineers and technical founders get paged cold with zero prior experience handling a real incident. Reading runbooks doesn't build on-call instincts. YouBrokeProd drops you into realistic incident simulations so when the real page comes in at 3 AM, you've already been there.

On-Call Skills That Actually Stick

10+ scenarios across beginner, intermediate, and advanced. New ones every 2 weeks.

Triage Database Failures Fast

Read Postgres error states, diagnose connection pool saturation, and fix replication issues without guessing.

Debug Kubernetes Under Pressure

Diagnose crashloops, OOMKills, and networking failures systematically instead of opening Stack Overflow.

Spot Security Issues Before They Escalate

Recognize credential exposure patterns, suspicious traffic, and misconfigurations that lead to real breaches.

How It Works

Each scenario is a real-time simulation running in your browser. No setup. Just you, a terminal, and a production incident to solve.

1

Get Paged

Pick a scenario and difficulty. You get a briefing with symptoms, a simulated terminal, and a ticking clock.

2

Investigate

Run real commands in the terminal - check logs, query metrics, inspect configs. Built-in hints if you get stuck.

3

Diagnose & Fix

Submit your root cause diagnosis, then apply the fix command. Scored on speed, accuracy, and efficiency.

4

Debrief

See what you got right, what you missed, and the optimal diagnostic path. Compare your score on the leaderboard.

On-call training for your whole team?

Run the same incident simulation across your SRE, platform, or founding engineering team. Compare scores, identify skill gaps, reduce MTTR, and build shared muscle memory for when the real pages come in. Manager reports and team leaderboards included.

See Team Plans

The Next Incident Won't Wait.
Will You Be Ready?

Sign up free and start your first incident simulation in under a minute.

Start Your First Simulation