Fix remaining use-after-free in dynamic worker loading (worker_loaders)#6553
Merged
kentonv merged 1 commit intoApr 10, 2026
Merged
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
Follow-up to cloudflare#6547, which fixed the deferred startup path but missed two additional crash vectors for the same root cause (cloudflare#6441). startRequest()` for the case where `isolate->service == kj::none` (async startup not yet complete). However, the crash reported in cloudflare#6441 also reproduces on the synchronous startup path, and with the same pattern on `ActorClassImpl::whenReady()`. The core problem: when JS code chains temporary objects like loader.get(name, getCode).getEntrypoint().evaluate(args) V8 can GC the Fetcher mid-request. This destroys the SubrequestChannelImpl, which releases its Rc<WorkerStubImpl>, which triggers WorkerStubImpl::unlink() → WorkerService::unlink(), clearing the LinkedIoChannels. The child worker's IoContext still holds raw pointers (via NullDisposer) to the WorkerService as its IoChannelFactory and LimitEnforcer, so the next I/O operation (e.g. an RPC callback to the parent) dereferences freed memory → SIGSEGV or SIGBUS. This remains 100% reproducible on current main using the reproduction from cloudflare#6441 (@cloudflare/codemode DynamicWorkerExecutor). Two additional fixes, both in WorkerLoaderNamespace: - SubrequestChannelImpl::startRequestImpl(): Attach kj::addRef(*this) to the returned WorkerInterface, keeping the SubrequestChannelImpl (and thus WorkerStubImpl and WorkerService) alive for the full request duration. This is the fix for the synchronous startup path that cloudflare#6547 did not address. - ActorClassImpl::whenReady(): Replace raw `[this]` capture with `[self = kj::addRef(*this)]` — same pattern as the SubrequestChannelImpl fix from cloudflare#6547, applied to the actor class deferred startup path. Requires `@cloudflare/codemode` and `wrangler`: ```json // package.json { "dependencies": { "@cloudflare/codemode": "^0.3.2", "wrangler": "^4.77.0" } } ``` ```jsonc // wrangler.jsonc { "name": "repro", "main": "src/index.ts", "compatibility_date": "2025-06-01", "compatibility_flags": ["nodejs_compat"], "worker_loaders": [{ "binding": "LOADER" }] } ``` ```ts // src/index.ts import { DynamicWorkerExecutor, resolveProvider } from '@cloudflare/codemode'; interface Env { LOADER: ConstructorParameters<typeof DynamicWorkerExecutor>[0]['loader']; } export default { async fetch(request: Request, env: Env) { const executor = new DynamicWorkerExecutor({ loader: env.LOADER, timeout: 30_000 }); const tools = { get_items: async () => Array.from({ length: 112 }, (_, i) => ({ id: `item_${i}`, name: `Item ${i}`, memo: 'x'.repeat(220), })), }; for (let i = 0; i < 6; i++) { const result = await executor.execute( `async () => { return await codemode.get_items(); }`, [resolveProvider({ name: 'codemode', tools })] ); if (result.error) return Response.json({ round: i, error: result.error }, { status: 500 }); } return Response.json({ ok: true }); }, }; ``` Then: `wrangler dev` and `curl http://localhost:8787` → segfault every time. To test a local workerd build against this reproduction: MINIFLARE_WORKERD_PATH=bazel-bin/src/workerd/server/workerd wrangler dev Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3767f5c to
908ada3
Compare
Contributor
Author
|
I have read the CLA Document and I hereby sign the CLA |
Member
|
Will merge when tests pass. FWIW internal build is irrelevant as this code is not part of the production implementation. |
|
fwiw, I built locally and this indeed fixed sigfaulting, thanks ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #6547, which fixed the deferred startup path but missed two additional crash vectors for the same root cause (#6441).
#6547 fixed the
[this, ...]capture inSubrequestChannelImpl:: startRequest()for the case whereisolate->service == kj::none(async startup not yet complete). However, the crash reported in #6441 also reproduces on the synchronous startup path, and with the same pattern onActorClassImpl::whenReady().The core problem: when JS code chains temporary objects like
V8 can GC the Fetcher mid-request. This destroys the SubrequestChannelImpl, which releases its Rc, which triggers WorkerStubImpl::unlink() → WorkerService::unlink(), clearing the LinkedIoChannels. The child worker's IoContext still holds raw pointers (via NullDisposer) to the WorkerService as its IoChannelFactory and LimitEnforcer, so the next I/O operation (e.g. an RPC callback to the parent) dereferences freed memory → SIGSEGV or SIGBUS.
This remains 100% reproducible on current main using the reproduction from #6441 (@cloudflare/codemode DynamicWorkerExecutor).
Two additional fixes, both in WorkerLoaderNamespace:
SubrequestChannelImpl::startRequestImpl(): Attach kj::addRef(*this) to the returned WorkerInterface, keeping the SubrequestChannelImpl (and thus WorkerStubImpl and WorkerService) alive for the full request duration. This is the fix for the synchronous startup path that ensure worker stub subrequest channel is kept alive until internal startRequest call #6547 did not address.
ActorClassImpl::whenReady(): Replace raw
[this]capture with[self = kj::addRef(*this)]— same pattern as the SubrequestChannelImpl fix from ensure worker stub subrequest channel is kept alive until internal startRequest call #6547, applied to the actor class deferred startup path.Reproduction
Requires
@cloudflare/codemodeandwrangler:Then:
wrangler devandcurl http://localhost:8787→ segfault every time.To test a local workerd build against this reproduction: