BUG: Prevent deadlock due to downstream importing NumPy in dlopen#31303
Conversation
Downstream should probably never import numpy (i.e. activate the C-API) during dlopen (rather than the Python module init function). But if it does so, this check introduced a deadlock regression because dlopen has a recursive lock, but this breaks if the matmul is called and causes threads to spawn that also need to dlopen. So, as a hot-fix, just only run the check on MacOS, since we are not currently aware of reports on other system (possible or not), although this did happen with certain OpenBLAS versions (and not just accelerate). (I cannot fully rule out that there won't be regressions on some linux setups, but they were not yet reported.) Closes numpygh-31284
|
How do we know people won’t dlopen on macos? |
|
I don't (unless Mac doesn't have such a mutex, but I'll doubt that). OTOH, we never had issues on Mac, where similar code always ran... So basically, we can be confident that this cannot be a regression w.r.t. the deadlock (and at that point, I may lean towards considering this a downstream issue, as I think the init is technically at the wrong spot). @rgommers might have a thought... |
rgommers
left a comment
There was a problem hiding this comment.
LGTM, thanks @seberg. I agree this is safe. The only other consideration is whether this can/should be further reduced to macOS arm64, because there's no problem on x86-64. It's conceivable that some users are still running MKL 2023 on macOS x86-64, although the chance of that intersection with a package like csp seems remote. Optional before merge I think, I'm mappy either way.
How do we know people won’t dlopen on macos?
Given that we only have to worry about BLAS libraries here, I think we can rely on our knowledge that the SME problem is Accelerate-specific (🤞🏼 it stays that way) and dlopen usage is MKL-specific (that's very unlikely to change).
We already have another MacOS bug check (although I think we could remove it). So I don't think there is any chance of regression. I guess we have to try this and put this in for now. It seems unlikely a non Mac Arm systems run into this right now. And if it happens to start, we might have to say that the |
|
(Ignoring those test failures, but it seems some compiler warnings might have changed? We may have to fix that up soon.) |
…1303) Downstream should probably never import numpy (i.e. activate the C-API) during dlopen (rather than the Python module init function). But if it does so, this check introduced a deadlock regression because dlopen has a recursive lock, but this breaks if the matmul is called and causes threads to spawn that also need to dlopen. So, as a hot-fix, just only run the check on MacOS, since we are not currently aware of reports on other system (possible or not), although this did happen with certain OpenBLAS versions (and not just accelerate). (I cannot fully rule out that there won't be regressions on some linux setups, but they were not yet reported. -- The "limit to 1 thread" solution is an alternative as well.)
BUG: Prevent deadlock due to downstream importing NumPy in dlopen (#31303)
…mpy#31303) Downstream should probably never import numpy (i.e. activate the C-API) during dlopen (rather than the Python module init function). But if it does so, this check introduced a deadlock regression because dlopen has a recursive lock, but this breaks if the matmul is called and causes threads to spawn that also need to dlopen. So, as a hot-fix, just only run the check on MacOS, since we are not currently aware of reports on other system (possible or not), although this did happen with certain OpenBLAS versions (and not just accelerate). (I cannot fully rule out that there won't be regressions on some linux setups, but they were not yet reported. -- The "limit to 1 thread" solution is an alternative as well.)
Downstream should probably never import numpy (i.e. activate the C-API) during dlopen (rather than the Python module init function).
But if it does so, this check introduced a deadlock regression because dlopen has a recursive lock, but this breaks if the matmul is called and causes threads to spawn that also need to dlopen.
So, as a hot-fix, just only run the check on MacOS, since we are not currently aware of reports on other system (possible or not), although this did happen with certain OpenBLAS versions (and not just accelerate).
(I cannot fully rule out that there won't be regressions on some linux setups, but they were not yet reported. -- The "limit to 1 thread" solution is an alternative as well.)
EDIT: addresses gh-31284. Should be backported for 2.4.5