Conversation
| // Although inspector I/O must happen on the InspectorService thread (to make sure breakpoints | ||
| // don't block inspector I/O), inspector messages must be actually dispatched on the Isolate | ||
| // thread. So, we run the dispatch loop on the Isolate thread. | ||
| auto dispatchLoopPromise = isolateThreadExecutor->executeAsync([this]() { |
There was a problem hiding this comment.
nit: I would move the executeAsync into the dispatchLoop method, it's a bit clearer for future readers imho.
otherwise lgtm
There was a problem hiding this comment.
I felt the same intuition, but wasn't confident the result was actually more readable, given the infamous footgunnery of lambda coroutines. The main choices that I see are:
- Rename
dispatchLoop()todispatchLoopImpl()and make a newdispatchLoop()whose body is a call todispatchLoopImpl()inside a lambda passed toexecuteAsync(). - Or, move
dispatchLoop()'s body into a constexpr lambda which acceptsincomingQueueNotifierandchannelas parameters. (Alternatively, it could accept aWebSocketIoHandler& self parameter, andself.qualify the member accesses.) - Or, move
dispatchLoop()'s body into a non-constexpr lambda which capturesthis, and wrap the lambda inkj::coCapture(). - Or, convert
dispatchLoop()to a recursive.then()function, which I think would also require a separatedispatchLoopImpl(), since we only want to callexecuteAsync()a single time. - Or, the PR as written.
There was a problem hiding this comment.
Not sure what you mean, seeing as how this (which is just a ptr) should outlive the promise returned by executeAsync there should be no difference between the current implementation and
// Must be called on the Isolate thread.
kj::Promise<void> dispatchLoop() {
return isolateThreadExecutor->executeAsync([this]() {
for (;;) {
co_await incomingQueueNotifier->awaitNotification();
KJ_IF_SOME(c, channel) {
co_await c.dispatchProtocolMessages(this->incomingQueue);
}
}
});
}The whole coro lambda fiasco is about local stack owned variables captured in lambdas, this isn't such a variable.
There was a problem hiding this comment.
Ah, you are right that that alternative actually is safe, because executeAsync() saves the lambda for the duration of the promise it returns, which is a guarantee I forgot it provides: https://github.com/capnproto/capnproto/blob/6cb55ac7d523c24047e6bbef0679027561f3c0d4/c%2B%2B/src/kj/async.h#L1040-L1041
To make sure we're on the same page, though: my understanding is that this being a pointer does not make this inherently safe. That is, my understanding is that it's the lifetime of the captured pointers/references (i.e., the lifetime of the lambda), not the lifetime of the pointed-to objects (e.g. *this), that matters.
My fear is that someone might see a this-capturing lambda coroutine and assume it is a generally safe pattern to use, and pass such a thing to, say, Promise<T>::then(), which does not preserve the lambda's lifetime. Because of this, I'd much prefer a blanket ban against non-constexpr, non-coCapture()ed coroutine lambdas, even if some usages like this one can be assumed to be safe.
|
👏 |
|
Added |
When breakpoint debugging was introduced, it added a new thread, the InspectorService thread. This thread is necessary in order to continue performing WebSocket I/O while the Isolate is paused at a breakpoint. The feature _also_ moved inspector message dispatching onto this same InspectorService thread. For most inspector messages, this turns out to be benign, but not for CPU profiling start/stop messages: those must be dispatched on the thread which we desire to profile. The end result was that breakpoint debugging caused us to profile JavaScript on the InspectorService thread (e.g., anything executed on the Chrome Devtools console) instead of the Isolate thread, as originally intended. This turns out to be relatively easy to fix, because breakpoint debugging would already dispatch inspector messages on the Isolate thread in a specific case: when the Isolate was paused at a breakpoint, meaning there already exists some machinery to pass inspector messages across threads. This commit simply makes it so that all messages are passed to the Isolate thread for dispatching, instead of only when paused at a breakpoint. We (obviously) do not have any tests for CPU profiling, or else we would have caught this issue much earlier. This commit does not add any such tests. Since we'd like to fix this issue sooner than later, my intent is to land this commit first and add tests later -- we will not close related tickets until such tests are implemented. I have smoke tested both profiling and breakpoint debugging; both work with the change in this commit.
4fdcaae to
19c8ed5
Compare
Obviously in retrospect, we already do. I modified the implementation which our internal codebase uses so that no internal change is required anymore (this change is effectively a no-op from its perspective), smoke tested profiling and breakpoint debugging once more on both codebases, and removed the Once @danlapid / @mar-cf give me a thumbs up on the latest diff, let's merge. |
LGTM |
|
If we ship this and cut a workerd release, then can get the ball rolling on testing out in Wrangler/Miniflare, both with the current fork of Chrome Devtools and an updated one. |
|
Is this in the latest release of workerd? I tried CPU profiling with whichever workerd is distributed with |
|
Ah, thanks @irvinebroque – I wonder if we should add the workerd version to |
This commit adds a regression test for the CPU profiling bug reported in #1754 and fixed in #2497. It is based on @mrbbot's original reproduction. I backported the test to the commit just prior to #2497, and confirmed that the test caught the original breakage. I wrote the test in JavaScript, because it seemed to have the richest ecosystem of tools for working with the Chrome Devtool Protocol. I originally intended to use Playwright to initiate a CDP connection to workerd, but it seemed to make too many assumptions that it was connecting to a browser. I tried the [chrome-remote-interface](https://github.com/cyrus-and/chrome-remote-interface) library next, and it seemed to work well. Fixes #1754.
This commit adds a regression test for the CPU profiling bug reported in #1754 and fixed in #2497. It is based on @mrbbot's original reproduction. I backported the test to the commit just prior to #2497, and confirmed that the test caught the original breakage. I wrote the test in JavaScript, because it seemed to have the richest ecosystem of tools for working with the Chrome Devtool Protocol. I originally intended to use Playwright to initiate a CDP connection to workerd, but it seemed to make too many assumptions that it was connecting to a browser. I tried the [chrome-remote-interface](https://github.com/cyrus-and/chrome-remote-interface) library next, and it seemed to work well. Fixes #1754.
This commit adds a regression test for the CPU profiling bug reported in #1754 and fixed in #2497. It is based on @mrbbot's original reproduction. I backported the test to the commit just prior to #2497, and confirmed that the test caught the original breakage. I wrote the test in JavaScript, because it seemed to have the richest ecosystem of tools for working with the Chrome Devtool Protocol. I originally intended to use Playwright to initiate a CDP connection to workerd, but it seemed to make too many assumptions that it was connecting to a browser. I tried the [chrome-remote-interface](https://github.com/cyrus-and/chrome-remote-interface) library next, and it seemed to work well. Fixes #1754.
This commit adds a regression test for the CPU profiling bug reported in #1754 and fixed in #2497. It is based on @mrbbot's original reproduction. I backported the test to the commit just prior to #2497, and confirmed that the test caught the original breakage. I wrote the test in JavaScript, because it seemed to have the richest ecosystem of tools for working with the Chrome Devtool Protocol. I originally intended to use Playwright to initiate a CDP connection to workerd, but it seemed to make too many assumptions that it was connecting to a browser. I tried the [chrome-remote-interface](https://github.com/cyrus-and/chrome-remote-interface) library next, and it seemed to work well. Fixes #1754.

When breakpoint debugging was introduced, it added a new thread, the InspectorService thread. This thread is necessary in order to continue performing WebSocket I/O while the Isolate is paused at a breakpoint. The feature also moved inspector message dispatching onto this same InspectorService thread. For most inspector messages, this turns out to be benign, but not for CPU profiling start/stop messages: those must be dispatched on the thread which we desire to profile. The end result was that breakpoint debugging caused us to profile JavaScript on the InspectorService thread (e.g., anything executed on the Chrome Devtools console) instead of the Isolate thread, as originally intended.
This turns out to be relatively easy to fix, because breakpoint debugging would already dispatch inspector messages on the Isolate thread in a specific case: when the Isolate was paused at a breakpoint, meaning there are already exists some machinery to pass inspector messages across threads. This commit simply makes it so that all messages are passed to the Isolate thread for dispatching, instead of only when paused at a breakpoint.
We (obviously) do not have any tests for CPU profiling, or else we would have caught this issue much earlier. This commit does not add any such tests. Since we'd like to fix this issue sooner than later, my intent is to land this commit first and add tests later -- we will not close related tickets until such tests are implemented.
I have smoke tested both profiling and breakpoint debugging; both work with the change in this commit.