Describe the bug
Hi, I have found another data race issue in index_gt::update.
|
// The trickiest part of update-heavy workloads is mitigating dead-locks |
|
// in connected nodes during traversal. A "good enough" solution would be |
|
// to skip concurrent access, assuming the other "close" node is gonna add |
|
// this one when forming reverse connections. |
|
bool failed_to_acquire = false; |
|
node_conditional_lock_t candidate_lock = |
|
node_try_conditional_lock_(candidate_slot, updated_slot != candidate_slot, failed_to_acquire); |
|
if (failed_to_acquire) |
|
continue; |
index_gt::update uses conditional lock(node_try_conditional_lock_) instead of the hard lock(node_lock_) that index_gt::add use. When candidate_slot == updated_slot, the function deliberately skips atomic_set, but it still returns a node_conditional_lock_t whose destructor unconditionally calls atomic_reset.
|
struct node_conditional_lock_t { |
|
nodes_mutexes_t& mutexes; |
|
std::size_t slot; |
|
inline ~node_conditional_lock_t() noexcept { |
|
if (slot != std::numeric_limits<std::size_t>::max()) |
|
mutexes.atomic_reset(slot); |
|
} |
|
}; |
|
|
|
inline node_conditional_lock_t node_try_conditional_lock_(std::size_t slot, bool condition, |
|
bool& failed_to_acquire) const noexcept { |
|
failed_to_acquire = condition ? nodes_mutexes_.atomic_set(slot) : false; |
|
return {nodes_mutexes_, failed_to_acquire ? std::numeric_limits<std::size_t>::max() : slot}; |
|
} |
It clears the bit for a slot we never locked and can even unlock another thread’s live lock on the same node.
Eventually, two threads think they own the slot, they both mutate the same neighbors_ref_t buffer. it is data race.
For reference, index_gt::add never happen same problem because it always uses node_lock_, so only update suffers.
I’ve tried resolving this issue and submitted a bugfix PR.
I would appreciate it if you could take a look. :)
Steps to reproduce
- Add 10,000 nodes (
index_gt::add).
- Update the same 10,000 nodes in parallel, modifying only their vector values (
index_gt::update).
The issue was reproduced in my environment with 16 threads, M = 32, and efConstruction = 128.
- TSan sometimes detects a data race (non-deterministic, timing-dependent).
Expected behavior
index_gt::update is thread-safe.
USearch version
v2.21.2
Operating System
Red Hat Enterprise Linux release 8.6
Hardware architecture
x86
Which interface are you using?
C++ implementation
Contact Details
kys910524@gmail.com
Are you open to being tagged as a contributor?
Is there an existing issue for this?
Code of Conduct
Describe the bug
Hi, I have found another data race issue in index_gt::update.
USearch/include/usearch/index.hpp
Lines 4146 to 4154 in 13c3fd9
index_gt::updateuses conditional lock(node_try_conditional_lock_) instead of the hard lock(node_lock_) thatindex_gt::adduse. Whencandidate_slot == updated_slot, the function deliberately skips atomic_set, but it still returns a node_conditional_lock_t whose destructor unconditionally calls atomic_reset.USearch/include/usearch/index.hpp
Lines 3828 to 3841 in 13c3fd9
It clears the bit for a slot we never locked and can even unlock another thread’s live lock on the same node.
Eventually, two threads think they own the slot, they both mutate the same neighbors_ref_t buffer. it is data race.
For reference,
index_gt::addnever happen same problem because it always usesnode_lock_, so only update suffers.I’ve tried resolving this issue and submitted a bugfix PR.
I would appreciate it if you could take a look. :)
Steps to reproduce
index_gt::add).index_gt::update).The issue was reproduced in my environment with 16 threads, M = 32, and efConstruction = 128.
Expected behavior
index_gt::updateis thread-safe.USearch version
v2.21.2
Operating System
Red Hat Enterprise Linux release 8.6
Hardware architecture
x86
Which interface are you using?
C++ implementation
Contact Details
kys910524@gmail.com
Are you open to being tagged as a contributor?
.githistory as a contributorIs there an existing issue for this?
Code of Conduct