Refactor actor handling in workerd, including some bugfixes#4041
Refactor actor handling in workerd, including some bugfixes#4041
Conversation
147c1fb to
22e635e
Compare
The map values are type `Own<ActorContainer>`. If we're in `ActorContainer`'s destructor, then the map entry must already have been erased, obviously.
AFAICT this IIFE didn't accomplish anything as its result was immediately returned.
We can create the ActorContainer and even return an existing Actor without taking the isolate lock, as we aren't executing any JS code in the target isolate.
Since we have to make a copy of the ID anyway, capture the copy upfront.
Keeping a whole separate table for these, keyed on the same string, is a waste. It would be pretty bad if they got out-of-sync anyway.
Pure code move, no changes.
Turns out this is no longer used.
…ets. I want to make sure I don't break this with subsequent changes.
In the next commit I'll be removing ActorContainerRef and just make ActorContainer itself refcounted. This lays some groundwork for that.
This seems more straightforward.
The body of `start()` is entirely moved code with no logic changes.
`getActor()` is the only public interface now.
In the process, get rid of getActorImpl() and GetActorResult.
An RPC call on an already-broken stub should throw the brokenness reason, but instead it threw an "internal error" which was a "PromiseFulfiller not fulfilled" error under the hood. I'm fixing that here because I need it fixed for a test I'm writing in a later commit.
In production, when an actor breaks, any stubs pointing at it are permanently broken. Using them again will repeatedly rethrow the broken reason. In workerd, though, the stub would "repair itself", switching to a new DO on subsequent requests. This is incorrect! This commit makes workerd match production.
This API wasn't acutally aborting the actors, it was just leaving them unreachable. With the recent changes, the existing stubs would continue pointing at the old instances of the objects, which still worked. But even before those changes, there would have been a problem if the actors were doing work in the background -- that work would keep running even after they were supposedly aborted, even as new instances could start up in parallel, leading to split brain.
In a Durable Object, the type of `ctx.id` is supposed to be of type `DurableObjectId`, and in production, it is. Way way back in #605, a bug was introduced such that `ctx.id` became a string for Durable Objects in workerd. But `ctx.id` is only supposed to be a string for "ephemeral objects" (aka colo-local actors). Somehow, nobody ever noticed? Probably because the only thing most people would do with it anyway is stringify it, which JavaScript is really good at doing automatically. Anyway, this fixes the bug, restoring the proper type. (Also this tidies up ownership of the key strings in the actor map. The key is now owned by the ActorContainer.)
22e635e to
4f936ed
Compare
|
(Rebased on main in hopes of fixing internal build.) |
The commit that eliminated ActorContainerRef accidentally removed the code to update the last-access time. This restores it in the appropriate places. I'm tacking this on the end of the PR as going back and rewriting history would be tedious at this point.
|
The internal build had passed, but after my last commit it broke again, not because of anything I changed, but because it always uses the latest version of the internal codebase, which has had breaking changes since the commit that my workerd PR is based on. I'm not going to rebase again for now -- let's just assume the internal build is not actually broken by my change. |
|
@MellowYarker ping? |
|
@kentonv sorry, didn't see this at all, I actually just saw the internal PR, was interested, and then followed the trail of breadcrumbs and found this PR. Will take a look some time tomorrow. |
MellowYarker
left a comment
There was a problem hiding this comment.
Pretty much LGTM, though there's a lot of changes so I want to take another pass before approving. Thanks for cleaning this code up, and sorry for not reviewing earlier. I need to clean up my github notifications...
MellowYarker
left a comment
There was a problem hiding this comment.
Thanks for the cleanup!
|
For some reason GitHub isn't accepting @MellowYarker as an approver. This is obviously wrong, so I am going to use my powers to bypass the merge check. |
This started with me refactoring the actor handling in
server.c++to make it more amenable to changes I want to make.Along the way, I realized there were actual bugs here, and fixed a few:
ctx.idin a Durable Object is supposed to be the specialDurableObjectIdtype, but due to a bug introduced years ago, in workerd it was being filled in as a plain string.There are a lot of commits here because I tried to keep each one small and easy to review. They should go fast.
Keep in mind
server.c++only applies to workerd -- this code is not used in production.I believe @MellowYarker is most familiar with this code so should be the main reviewer. Adding @shrima-cf because I think she recently hit these bugs while writing tests.