Amazon apologises to customers affected by huge AWS outage
Amazon Web Services (AWS) has apologised to customers after a major outage on Monday disrupted more than 1,000 sites and services, including Snapchat, Reddit and Lloyds Bank.
The disruption was traced to AWS’s US-EAST-1 region in Northern Virginia, where internal errors prevented systems from correctly mapping websites to the IP addresses computers use to find them.
In a post-incident summary, Amazon stated that the failure originated from critical processes in a DNS-related database becoming out of sync, exposing a “latent race condition” and subsequently cascading through automated systems.
“We apologise for the impact this event caused our customers,” AWS said, adding it understands “how critical our services are” and would “do everything we can” to learn from the incident and improve availability.
While platforms such as Roblox and Fortnite recovered within hours, others experienced extended downtime. Lloyds Bank customers reported issues into mid-afternoon, and US payments app Venmo and social media site Reddit were also affected.
The outage had unusual knock-on effects beyond apps and banking. Eight Sleep said some of its internet-connected “pods” overheated or remained stuck in an inclined position, prompting the company to pledge to “outage-proof” its mattresses.
Experts said the episode underscored the tech sector’s dependence on a handful of cloud providers. Dr. Junade Ali, a software engineer and fellow at the IET, described “faulty automation” as the core of the problem and urged firms to build resilience by diversifying their cloud infrastructure to allow for failover when a single region or provider suffers an incident.
AWS’s investigation found that a delay in one process early Monday triggered a sequence that broke internal “address book” systems relied upon by other services in the region, halting normal operations until synchronization was restored.