Skip to content

Fix remaining use-after-free in dynamic worker loading (worker_loaders)#6553

Merged
kentonv merged 1 commit into
cloudflare:mainfrom
airhorns:harry/fix-arm64-dyn-worker-segfault
Apr 10, 2026
Merged

Fix remaining use-after-free in dynamic worker loading (worker_loaders)#6553
kentonv merged 1 commit into
cloudflare:mainfrom
airhorns:harry/fix-arm64-dyn-worker-segfault

Conversation

@airhorns

Copy link
Copy Markdown
Contributor

Follow-up to #6547, which fixed the deferred startup path but missed two additional crash vectors for the same root cause (#6441).

#6547 fixed the [this, ...] capture in SubrequestChannelImpl:: startRequest() for the case where isolate->service == kj::none (async startup not yet complete). However, the crash reported in #6441 also reproduces on the synchronous startup path, and with the same pattern on ActorClassImpl::whenReady().

The core problem: when JS code chains temporary objects like

loader.get(name, getCode).getEntrypoint().evaluate(args)

V8 can GC the Fetcher mid-request. This destroys the SubrequestChannelImpl, which releases its Rc, which triggers WorkerStubImpl::unlink() → WorkerService::unlink(), clearing the LinkedIoChannels. The child worker's IoContext still holds raw pointers (via NullDisposer) to the WorkerService as its IoChannelFactory and LimitEnforcer, so the next I/O operation (e.g. an RPC callback to the parent) dereferences freed memory → SIGSEGV or SIGBUS.

This remains 100% reproducible on current main using the reproduction from #6441 (@cloudflare/codemode DynamicWorkerExecutor).

Two additional fixes, both in WorkerLoaderNamespace:

Reproduction

Requires @cloudflare/codemode and wrangler:

// package.json
{ "dependencies": { "@cloudflare/codemode": "^0.3.2", "wrangler": "^4.77.0" } }
// wrangler.jsonc
{
  "name": "repro",
  "main": "src/index.ts",
  "compatibility_date": "2025-06-01",
  "compatibility_flags": ["nodejs_compat"],
  "worker_loaders": [{ "binding": "LOADER" }]
}
// src/index.ts
import { DynamicWorkerExecutor, resolveProvider } from '@cloudflare/codemode';
interface Env {
  LOADER: ConstructorParameters<typeof DynamicWorkerExecutor>[0]['loader'];
}
export default {
  async fetch(request: Request, env: Env) {
    const executor = new DynamicWorkerExecutor({ loader: env.LOADER, timeout: 30_000 });
    const tools = {
      get_items: async () =>
        Array.from({ length: 112 }, (_, i) => ({
          id: `item_${i}`, name: `Item ${i}`, memo: 'x'.repeat(220),
        })),
    };
    for (let i = 0; i < 6; i++) {
      const result = await executor.execute(
        `async () => { return await codemode.get_items(); }`,
        [resolveProvider({ name: 'codemode', tools })]
      );
      if (result.error) return Response.json({ round: i, error: result.error }, { status: 500 });
    }
    return Response.json({ ok: true });
  },
};

Then: wrangler dev and curl http://localhost:8787 → segfault every time.

To test a local workerd build against this reproduction:

MINIFLARE_WORKERD_PATH=bazel-bin/src/workerd/server/workerd wrangler dev

@airhorns airhorns requested review from a team as code owners April 10, 2026 14:52
@github-actions

github-actions Bot commented Apr 10, 2026

Copy link
Copy Markdown

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Follow-up to cloudflare#6547, which fixed the deferred startup path but missed
two additional crash vectors for the same root cause (cloudflare#6441).

startRequest()` for the case where `isolate->service == kj::none`
(async startup not yet complete). However, the crash reported in cloudflare#6441
also reproduces on the synchronous startup path, and with the same
pattern on `ActorClassImpl::whenReady()`.

The core problem: when JS code chains temporary objects like

    loader.get(name, getCode).getEntrypoint().evaluate(args)

V8 can GC the Fetcher mid-request. This destroys the
SubrequestChannelImpl, which releases its Rc<WorkerStubImpl>, which
triggers WorkerStubImpl::unlink() → WorkerService::unlink(), clearing
the LinkedIoChannels. The child worker's IoContext still holds raw
pointers (via NullDisposer) to the WorkerService as its
IoChannelFactory and LimitEnforcer, so the next I/O operation (e.g.
an RPC callback to the parent) dereferences freed memory → SIGSEGV
or SIGBUS.

This remains 100% reproducible on current main using the reproduction
from cloudflare#6441 (@cloudflare/codemode DynamicWorkerExecutor).

Two additional fixes, both in WorkerLoaderNamespace:

- SubrequestChannelImpl::startRequestImpl(): Attach
  kj::addRef(*this) to the returned WorkerInterface, keeping the
  SubrequestChannelImpl (and thus WorkerStubImpl and WorkerService)
  alive for the full request duration. This is the fix for the
  synchronous startup path that cloudflare#6547 did not address.

- ActorClassImpl::whenReady(): Replace raw `[this]` capture with
  `[self = kj::addRef(*this)]` — same pattern as the
  SubrequestChannelImpl fix from cloudflare#6547, applied to the actor class
  deferred startup path.

Requires `@cloudflare/codemode` and `wrangler`:

```json
// package.json
{ "dependencies": { "@cloudflare/codemode": "^0.3.2", "wrangler": "^4.77.0" } }
```

```jsonc
// wrangler.jsonc
{
  "name": "repro",
  "main": "src/index.ts",
  "compatibility_date": "2025-06-01",
  "compatibility_flags": ["nodejs_compat"],
  "worker_loaders": [{ "binding": "LOADER" }]
}
```

```ts
// src/index.ts
import { DynamicWorkerExecutor, resolveProvider } from '@cloudflare/codemode';
interface Env {
  LOADER: ConstructorParameters<typeof DynamicWorkerExecutor>[0]['loader'];
}
export default {
  async fetch(request: Request, env: Env) {
    const executor = new DynamicWorkerExecutor({ loader: env.LOADER, timeout: 30_000 });
    const tools = {
      get_items: async () =>
        Array.from({ length: 112 }, (_, i) => ({
          id: `item_${i}`, name: `Item ${i}`, memo: 'x'.repeat(220),
        })),
    };
    for (let i = 0; i < 6; i++) {
      const result = await executor.execute(
        `async () => { return await codemode.get_items(); }`,
        [resolveProvider({ name: 'codemode', tools })]
      );
      if (result.error) return Response.json({ round: i, error: result.error }, { status: 500 });
    }
    return Response.json({ ok: true });
  },
};
```

Then: `wrangler dev` and `curl http://localhost:8787` → segfault every time.

To test a local workerd build against this reproduction:

    MINIFLARE_WORKERD_PATH=bazel-bin/src/workerd/server/workerd wrangler dev

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@airhorns airhorns force-pushed the harry/fix-arm64-dyn-worker-segfault branch from 3767f5c to 908ada3 Compare April 10, 2026 14:53
@airhorns

Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request Apr 10, 2026

@kentonv kentonv left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kentonv

kentonv commented Apr 10, 2026

Copy link
Copy Markdown
Member

Will merge when tests pass.

FWIW internal build is irrelevant as this code is not part of the production implementation.

@kentonv kentonv merged commit 696113e into cloudflare:main Apr 10, 2026
19 of 20 checks passed
@jrowny

jrowny commented Apr 10, 2026

Copy link
Copy Markdown

fwiw, I built locally and this indeed fixed sigfaulting, thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants