-
-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Currently, AsyncWorker uses the global pthread pool for scheduling its own tasks, taking threads as they become available. This works fine if AsyncWorker is the exclusive user of multithreading, but not when the addon is linked with other libraries that also rely on threads.
Here's the general problematic scenario:
- addon
Ahasasync_methodthat spawns tasks onto anAsyncWorker - it links with library
Band invokesB.some_method()that uses up toNthreads itself - logically, user builds addon with
PTHREAD_POOL_SIZE=N+1(Nthreads for the library + 1 for the async op in the addon itself)
Now, if user invokes A.async_method(), the addon A will take one thread from the pool for the worker, N threads are left, the library takes those N threads, finishes the work, everything's fine.
But now another user has code that invokes async_method() on various inputs in parallel e.g. via Promise.all:
Promise.all([input1, input2]).map(input => A.async_method())Now, the addon A will take 2 threads (one per each async_method() invocation) to create the AsyncWorkers, each of those workers invokes the B.some_method(), each of those parallel B.some_method() invocations tries to allocate N workers == 2*N total, but the pool only has (N+1)-2 == N-1 threads at this point! So each of them deadlocks.
User could increase the PTHREAD_POOL_SIZE to 2*(N+1), but then the same failure will happen if they happen to invoke 3 invocations of the library in parallel, and so on.
I actually observed this behaviour with deadlocks in a real-world addon I'm cross-compiling to Wasm with emnapi right now, and it took some time to figure out what's happening.
In my case I fixed this by monkey-patching the async methods to queue them one by one when compiled to Wasm, but, ideally, this should be handled at the emnapi level.
I think the only reliable solution here is for emnapi to have a separate, exclusive, thread pool, instead of taking as many threads as it wants from the global Emscripten pool. Then, it could either use the same JS logic or even some C++ implementation of work-stealing to execute max of EMNAPI_WORKER_POOL_SIZE in parallel, and no more. This would allow users to configure EMNAPI_WORKER_POOL_SIZE to the desired number of tasks, and PTHREAD_POOL_SIZE to EMNAPI_WORKER_POOL_SIZE * (N + 1) in advance, and avoid risk of deadlocks altogether.
And then, those users who know that they're not linking against any multithreaded libraries will be able to set both PTHREAD_POOL_SIZE and EMNAPI_WORKER_POOL to (number of CPU cores) and get max parallelisation as they can today.