ENH: spatial: ensure thread-safety for KDTree#20655
ENH: spatial: ensure thread-safety for KDTree#20655andfoy wants to merge 9 commits intoscipy:mainfrom
Conversation
|
I am a bit uncertain if property |
|
In context of No-GIL CPython, there are several things that are not thread-safe. For example:
|
|
Thanks for the pointers @sturlamolden. I have the following questions and comments:
|
|
The issue with |
|
The more serious issue is that |
|
Now I understand everything, so, basically we should lock on |
|
Query functions must wait for the atomic flag set by |
|
Thanks for the helpful discussion @sturlamolden! |
|
No problem. When I wrote the code I depended on the GIL for serialization. For example it ensures that |
|
Actually it is already broken as |
|
Great! Then this PR will fix the concurrency issues in KDTree, let me add the changes and update the title |
|
Note that my comments refer to |
Yes, in free-threaded python when the GIL is disabled then something needs to explicitly lock access to prevent data races. If the data race happens in python-level code, then a |
|
We already make use of the |
|
Right, C-level code will need to use C-level locking primitives. |
|
Use the primitives in |
|
Also you cannot use a |
254971c to
14a0751
Compare
rgommers
left a comment
There was a problem hiding this comment.
The locking of cKDTree.tree looks good to go. I don't think the added test currently tests the right thing though - it calls .query_ball_point, but that doesn't call .tree right? It seems to me that a second test, or a second for-loop in the same test, should try to directly access .tree from inside threads.
It would also be good to add a summary of how the added test fails under free-threaded CPython if the cKDTree code does not get a lock (what's the failure mode, is the test failing 100% of the time or intermittently).
14a0751 to
41f8698
Compare
|
@andfoy is this ready again from your perspective? |
|
I think we need a lock in all the query methods as well. The pickle methods also need serialization. Currently it only puts a lock on the initialization, which is far from sufficient. |
|
@rgommers I need to take @sturlamolden considerations into account, also the added test needs some fixing/debug |
e356425 to
901f468
Compare
901f468 to
bf189a5
Compare
bf189a5 to
c9a80e0
Compare
|
I undertook a detailed look throughout all the query methods in C++ and I couldn't find any instruction that could mutate the tree state, therefore those methods should not require any locking. |
The query methods do not mutate the tree state themselves, but oher methods may while the queries are running. This can cause segfaults or errorneous results. The queries muct therefore lock the kd-tree for self-preservation. Your comment indicate a lack of understanding of concurrent systems. If you do not understand something this simple, please do not contribute code that uses or controls concurrency. And I am sorry for using direct language, I am Scandinavian. |
|
@sturlamolden, sorry, but you shouldn't assume the level of knowledge of contributors, neither impose conditions about who should contribute to which topics and not. For your information, I dedicated part of my studies to study concurrency, parallelism and distribution. I'm sorry, but being Scandinavian doesn't exclude you from respecting people. I'm going to apply the changes, given that you know the codebase better. Just for clarity, the comment was made assuming that users do not interleave mutating calls from queries to the tree, which delegates concurrency handling to them as opposed to the library trying to anticipate use-cases, given that locking causes performance drops under parallelism. |
|
@sturlamolden please don't tell others not to contribute, your comment seems a little insulting for no reason. You're effectively one of the few maintainers of
It would be helpful to identify those methods. The docs may be incomplete; there are only 6 documented methods and they all look safe. If there are public methods (or direct attribute access) to mutate the data structure, then the optimal approach is probably somewhere in the middle: some sort of locking is needed, but it is possible to run several query methods in parallel when those don't mutate themselves. This can be done with a "multiple readers single writer" lock for example, e.g. |
That is in effect what I suggested for the query functions.
will wait for the writer (the setup functions) to finish, while not blocking out readers (the query functions). This must be the first thing the query functions do. |
|
Hello, any update on this? |
|
I opened #24799 which supersedes this PR. See my PR description for why I didn't start with the code here. |
Reference issue
See gh-20669
What does this implement/fix?
This PR adds a quick check to ensure that concurrent calls to
KDTreedo not cause any unexpected errors. This is specially handy in the context of No-GIL CPython.Additional information
In principle, concurrent access to
KDTreeis allowed out-of-the-box, since there are no methods that can mutate the state of the tree besides the constructor.