Skip to content

macOS: workerd crashes with dual-stack "happy eyeballs" HTTP requests #4623

@AaronO

Description

@AaronO

Tested on workerd-darwin-arm64@v1.20250726.0

Symptom:

workerd running on macOS crashes when receiving HTTP requests from clients that use a "happy eyeballs" (RFC 8305) connection strategy.
bun is a common "offender", it'll dual-stack race connections to localhost, surfacing this issue.

Using [::1] or 127.0.0.1 as the host string doesn't trigger the issue, since it'll attempt a single connection (either IPv4 or IPv6) instead of both.

Related issues: Fixing this, fixes oven-sh/bun#12730, cloudflare/workers-sdk#9328

Root Cause:

TLDR: macOS' accept() may return aborted connections (handshake then RST). The kernel returns a valid fd but sets addrlen == 0 instead of ECONNABORTED as one might expect (or simply skipping that conn in kernel-land).
(Technically, it's a broader issue with gracefully handling aborted connections, not just dual-stack races)

The crash is caused by an uncaught kj::Exception originating from capnproto/kj/:

  1. bun initiates two connections (e.g., IPv4 and IPv6).
  2. One connection succeeds, and bun immediately closes/resets the other.
  3. The workerd server calls accept() on the listening socket.
  4. On macOS, the kernel may return a valid file descriptor for the connection that was just aborted by the client, but it will report the peer address length as 0.
  5. kj/async-io-unix.c++ doesn't handle this, passing a zero addrlen to NetworkFilter::shouldAllow which then trips an assert

Resulting in the following fatal/uncaught exception, crashing the process:

Fatal uncaught kj::Exception: kj/async-io.c++:3120: failed: expected addrlen >= sizeof(addr->sa_family) [0 >= 1]

Impact:

This bug makes workerd/wrangler unstable for local development on macOS, causing surprising bugs when used with bun

Resolution:

The fix needs to be applied in the upstream capnproto/capnproto (c++/src/kj/async-io-unix.c++) repository to handle the addrlen == 0 case gracefully.
I'm creating this issue to track the problem within workerd and signal the root issue to the other repos.

Minimal reproduction:

Run any worker with workerd serve on macOS (reproduced with the hello.js and hello.capnp from the README).

❯ ./workerd-darwin-amd64 serve hello.capnp

Then run bun:

❯ bun -e "console.log(await (await fetch('http://localhost:8080')).text())"
error: The socket connection was closed unexpectedly. For more information, pass `verbose: true` in the second argument to fetch()
  path: "http://localhost:8080/",
 errno: 0,
  code: "ECONNRESET"


Bun v1.2.19 (macOS arm64)

The workerd process will now have crashed:

❯ ./workerd-darwin-arm64 serve hello.capnp
*** Fatal uncaught kj::Exception: kj/async-io.c++:3120: failed: expected addrlen >= sizeof(addr->sa_family) [0 >= 1]
stack: 104566aa7 10458ca4f 1023a32bf 1023a3763 1023a7aaf 1023a824f 1023a8cc7 1023398bb 1045aea2f 1045aed37 1045ad77f 1045ad4eb 102328803 19eaa2b97

Calling bun using [::1] or 127.0.0.1 will not trigger the crash, as expected:

❯ bun -e "console.log(await (await fetch('http://127.0.0.1:8080')).text())"
Hello World
❯ bun -e "console.log(await (await fetch('http://[::1]:8080')).text())"
Hello World

Stress client:

Here's a minimal stress_client.c that reproduces this error by opening and aborting many connections:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

int main(int argc, char *argv[]) {
  struct addrinfo hints, *servinfo, *p;
  int rv;
  int sockfd;
  long count = 0;

  if (argc != 3) {
    fprintf(stderr, "usage: %s hostname port\n", argv[0]);
    exit(1);
  }

  memset(&hints, 0, sizeof hints);
  hints.ai_family = AF_UNSPEC;
  hints.ai_socktype = SOCK_STREAM;

  if ((rv = getaddrinfo(argv[1], argv[2], &hints, &servinfo)) != 0) {
    fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rv));
    return 1;
  }

  // Use the first result from getaddrinfo
  p = servinfo;

  printf("Stressing server %s on port %s. Press Ctrl+C to stop.\n", argv[1], argv[2]);

  for (;;) {
    sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol);
    if (sockfd == -1) {
      continue;
    }

    // Set SO_LINGER to {1, 0} to send RST on close() instead of FIN.
    // This makes the connection abort more abrupt.
    struct linger so_linger;
    so_linger.l_onoff = 1;
    so_linger.l_linger = 0;
    setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &so_linger, sizeof(so_linger));

    if (connect(sockfd, p->ai_addr, p->ai_addrlen) != -1) {
      count++;
      if ((count % 1000) == 0) {
        printf(".");
        fflush(stdout);
      }
    }

    close(sockfd);
  }

  freeaddrinfo(servinfo);
  return 0;
}

Evidence:

Inspecting the traffic with wireshark (or similar tools), we'll see the following:

No.	Time	Source	Destination	Protocol	Length	Info
1	0.000000	::1	::1	TCP	88	63739 → 8080 [SYN] Seq=0 Win=65535 ...
2	0.000034	127.0.0.1	127.0.0.1	TCP	68	63740 → 8080 [SYN] Seq=0 Win=65535 ...
3	0.000096	::1	::1	TCP	88	8080 → 63739 [SYN, ACK] Seq=0 Ack=1 Win=65535 ...
4	0.000132	127.0.0.1	127.0.0.1	TCP	68	8080 → 63740 [SYN, ACK] Seq=0 Ack=1 Win=65535 ...SACK_PERM
5	0.000150	::1	::1	TCP	76	63739 → 8080 [ACK] Seq=1 Ack=1 Win=407808 ...
6	0.000157	127.0.0.1	127.0.0.1	TCP	56	63740 → 8080 [ACK] Seq=1 Ack=1 Win=408320 ...
7	0.000170	::1	::1	TCP	76	[TCP Window Update] 8080 → 63739 [ACK] Seq=1 Ack=1 Win=407808 ...
8	0.000176	127.0.0.1	127.0.0.1	TCP	56	[TCP Window Update] 8080 → 63740 [ACK] Seq=1 Ack=1 Win=408320 ...
9	0.000198	127.0.0.1	127.0.0.1	TCP	44	63740 → 8080 [RST, ACK] Seq=1 Ack=1 Win=408320 ...
10	0.001179	::1	::1	HTTP	219	GET / HTTP/1.1 
11	0.001217	::1	::1	TCP	76	8080 → 63739 [ACK] Seq=1 Ack=144 Win=407680 ...
12	0.006752	::1	::1	TCP	76	8080 → 63739 [FIN, ACK] Seq=1 Ack=144 Win=407680 ...
13	0.006801	::1	::1	TCP	76	63739 → 8080 [ACK] Seq=144 Ack=2 Win=407808 ...
14	0.007110	::1	::1	TCP	64	63739 → 8080 [RST, ACK] Seq=144 Ack=2 Win=407808 ...

(We see bun handshaking over IPv4 & IPv6, then sending a RST aborting the "slower" IPv4 connection, IPv6 sends the HTTP req packet but the server crashes before responding)

Sanity checks:

This bug appears to be macOS specific (possibly other BSDs), I couldn't reproduce it on linux and if we check their respective syscall implementations we see that:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions