Fix synchronization between main thread and worker#14541
Fix synchronization between main thread and worker#14541nicolo-ribaudo merged 10 commits intobabel:mainfrom
Conversation
| ); | ||
|
|
||
| #signal = new Int32Array(new SharedArrayBuffer(4)); | ||
| #signal = new Int32Array(new SharedArrayBuffer(8)); |
There was a problem hiding this comment.
Why do we need doubling the byte length?
There was a problem hiding this comment.
if (resp) break;
Atomics.wait(this.#signal, 1, 0, 30);For convenience, I increased the byte length for sleep.
|
I will re-run CI test for a few more times. Hopefully the last flaky test is fixed. |
|
Ha ha! |
nicolo-ribaudo
left a comment
There was a problem hiding this comment.
- Is it needed to run the loop 100 times? Isn't 2 enough?
- When the worker main thread is woken up but we receive undefined, does the worker wake up us again?
- Has this bug ever been reproduced outside of our CI? If not, we could consider making the fix a test-only fix rather than adding it to the published code.
I do not know.
I don't quite understand what you mean here.
There are currently no reports on this issue. |
|
@liuxingbaoyu I pushed a commit with some logs to understand better what's causing the bug. |
ce563b8 to
7584140
Compare
|
Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/51995/ |
f64aa11 to
d696a81
Compare
I may now understand a little what this means.
I've also tried The above explanation may not be completely accurate because nodejs is too complicated. |
|
^ @addaleax (if you have time to answer) Is it expected that using // main.js
const signal = new Int32Array();
const subChannel = new MessageChannel();
// Tell the worker to start doing its job
worker.postMessage({ signal, port: subChannel.port1 }, [subChannel.port1]);
// Wait until the worker is done
Atomics.wait(signal, 0, 0);
// Read the worker response
const response = receiveMessageOnPort(subChannel.port2);// worker.js
parentPort.addListener("message", async ({ signal, port }) => {
let response = await computeResponse();
// Send the response to the main thread
port.postMessage(response);
port.close();
// Wake up the main thread
Atomics.store(signal, 0, 1);
Atomics.notify(signal, 0);
});And sometime |
|
@nicolo-ribaudo Are you sure this code is doing what you want it to do? I see that there are calls to multiple babel/eslint/babel-eslint-parser/src/parse.cjs Lines 39 to 40 in 885e1e0 These would result in multiple #signal[0] is still 0), which means that postMessage() may not have run yet when it returns.
Am I missing something? |
|
Thanks for your reply! always resets the signal to0 before calling postMessage. The main thread is fully synchronous, so there shouldn't be any race condition between these client.* calls.
The confusing thing is that the code works ~90% of the times. |
d696a81 to
89bb6ab
Compare
|
Ah yeah, that one I was indeed missing. 🙂 Do you have a standalone reproduction? Can you compare the output of a failing/a successful run when running node as (I think technically you should be using |
|
I cannot even reproduce the error locally 😂 I'll continue tweaking this PR, feel free to unsubscribe from notifications and I'll ping you if I have more info. |
|
@addaleax @nicolo-ribaudo |
89bb6ab to
d1d1281
Compare
|
The errors are just random at this point, now they are about some unrelated functions not being callable. |
|
Using a new SAB every times seems to have fixed it? Now the logs shows that it always works at the first |
|
Maybe we should enable all tests and try again? |
|
The exception for |
|
Yup, that's what I was seeing when I commented #14541 (comment) 😬 It looks like v8's internal memory is getting corrupted somehow, and variable values are randomly changed. Let's pretend we didn't see that failure. |
|
The new error is on the |
|
There is nothing worse than this.😰😰😰 |
|
I can reproduce the new error locally 🥳 |
|
|
6a1cd01 to
5794b3b
Compare
|
This now fixes the I'm going to merge this PR for now. |
|
I may have found the cause of that error. The reason is that E.g |
This seems to be a bug in nodejs.
But it's hard to reproduce and we want to be compatible with older versions of nodejs.
So fixed in babel.
This PR also fixes unstable CI.