Skip to content

Refactor how IoContext aborts are implemented#4165

Merged
kentonv merged 13 commits intomainfrom
kenton/refactor-abort
May 23, 2025
Merged

Refactor how IoContext aborts are implemented#4165
kentonv merged 13 commits intomainfrom
kenton/refactor-abort

Conversation

@kentonv
Copy link
Member

@kentonv kentonv commented May 18, 2025

IoContext abort logic evolved gradually and, much like evolution, the result was a mess.

This PR attempts to improve this.

Note that this is based on top of #4161; please review that first, and I'll keep this one as a draft in the meantime.

Aside from cleaning things up, this fixes two bugs:

  • IoContext::abortException was only set for aborts caused by someone explicitly calling IoContext::abort(), but many kinds of aborts (such as input gate and output gate failures) did not come in through this path. So, the previous attempt to make run() rethrow the abort exception if called post-abort did not always work. Now, all kinds of aborts call abort(), so the exception propagates properly.
  • In actors, aborting a request did not actually cancel I/O tasks, bit it did cancel the drain() thus causing all IncomingRequests to end. If someone continued to hold open a stub to the actor, I/O tasks could keep running and then generate spurious expected !incomingRequests.empty() errors later when they tried to re-enter the isolate.
  • There were various race where an aborted context might throw some exception at the client that was not the abort exception, such as throwing "This promise will never resolve" or "The script will never generate a response". Many of these should be fixed now to actually propagate the abort exception.

@kentonv kentonv requested a review from ObsidianMinor May 18, 2025 19:32
Base automatically changed from kenton/reentry-callback to main May 20, 2025 01:11
@kentonv kentonv force-pushed the kenton/refactor-abort branch from f8f8b6a to 482bf4d Compare May 20, 2025 01:33
@kentonv kentonv marked this pull request as ready for review May 20, 2025 01:34
@kentonv kentonv requested review from a team as code owners May 20, 2025 01:34
@kentonv
Copy link
Member Author

kentonv commented May 20, 2025

The internal build will fail until internal PR 10559 is merged to improve the output gate test. That PR can be merged first; it doesn't depend on this PR, it just enables this PR.

@kentonv kentonv force-pushed the kenton/refactor-abort branch from 482bf4d to 6f4aa13 Compare May 20, 2025 16:01
@kentonv kentonv force-pushed the kenton/refactor-abort branch from 6f4aa13 to c7df745 Compare May 20, 2025 23:03
@kentonv
Copy link
Member Author

kentonv commented May 20, 2025

Just rebased. Unfortunately there's one test in the internal codebase that does depend on that error message so I had to go open a PR for that...

Copy link
Collaborator

@irvinebroque irvinebroque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were various race where an aborted context might throw some exception at the client that was not the abort exception, such as throwing "This promise will never resolve" or "The script will never generate a response". Many of these should be fixed now to actually propagate the abort exception.

This is awesome — is there a basic example we could use in a changelog entry?

Want to highlight to customers that we are fixing things that result in opaque errors for them.

@kentonv kentonv force-pushed the kenton/refactor-abort branch 2 times, most recently from 5b82ef4 to b7b0431 Compare May 21, 2025 22:07
kentonv added 9 commits May 22, 2025 18:10
Prior to this, `abort()` was just one one way of fulfilling the abort promise, but other ways were joined into the promise in the constructor. Confusingly, this meant that `abortException` was not set correctly when aborting for any of those other reasons.

Now the constructor calls `abortWhen()` to wait on each of those promises that were previously joined in. When any promise passed to `abortWhen()` eventually rejects, `abort()` is called.
I can't think of any reason why this was done in a different place, so let's consolidate.
I've always wanted to do this. Node's process.exit() already figured it out, so this copies its technique.

Relatedly, we can have `runImpl()` rethrow the correct abort exception, yay.
This means they don't need the `finalize()` mechanism anymore.

Also this means we don't need to separately join with `onAbort()` in `WorkerEntrypoint::request()`, becaues the thing `ServiceWorkerGlobalScope::request()` returns is already an `awaitJs()` promise.
Note: This has nothing to do with GC finalization.

This was sort of an alternative abort mechanism where we could cause all promises exported from the IoContext to reject when we detected a hang via PendingEvents. We can just use `abort()` instead.
It doesn't really make sense that we'd null it out specifically after a hang but not in other cases.

Also remove the seemingly-irrelevant nulling out of `promiseContextTag` after a hang.
kentonv added 4 commits May 22, 2025 18:10
This should prevent lingering tasks from failing later due to there being no IncomingRequest. I'm pretty sure the only way that happens is when the actor was aborted.
There are now two methods:

* js.terminateNextExecution(), which has the previous behavior.
* js.terminateExecutionNow(), which forces the termination to happen synchronously and always throws.

This lets us take the hack that had proliferated to three different places and keep it in one place.
@kentonv kentonv force-pushed the kenton/refactor-abort branch from b7b0431 to 44d3004 Compare May 22, 2025 23:10
@kentonv kentonv merged commit cc0d789 into main May 23, 2025
18 checks passed
@kentonv kentonv deleted the kenton/refactor-abort branch May 23, 2025 01:28

// HACK: This has been observed to reliably make V8 check the termination flag and raise the
// uncatchable termination exception.
jsg::check(v8::JSON::Stringify(v8Context(), str()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting (and weird, as always in V8) finding. Is there any V8 doc or discussion about it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. It was just the first thing that worked and we didn't spend too much time thinking about it. Probably there are many V8 APIs we could call which should cause it to check the termination flag.

Of course, it would be better to add an explicit API to V8 to ask it to check the flag but that takes time and there's so much to do. :) If this stops working the tests will tell us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, having an explicit API would definitely be ideal, though I imagine it wouldn’t be easy to get into V8. Still, I’m impressed you guys found this workaround — thanks for sharing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants