Thread safe reads for aggregators in IncrementalIndex#3956
Thread safe reads for aggregators in IncrementalIndex#3956nishantmonu51 wants to merge 2 commits intoapache:masterfrom
Conversation
Aggregators are *NOT* thread safe and if two threads concurrently try to read/write to the aggregator the reader may read absurd values since the aggregate method is not *atomic*. In case of IncrementalIndex the writes are protected by a sync block but the reads are unprotected, so its possible for the queries to read absurd values in aggregator.get(). This PR adds a test that can reproduce that behavior by wrapping Aggregators inside a ThreadSafetyAssertionAggregator. TODO: test any performance impacts.
| public void aggregate() | ||
| { | ||
| delegate1.aggregate(); | ||
| Thread.yield(); |
There was a problem hiding this comment.
If we replace yield with Thread.sleep(1), the issue is reproduced more frequently but this slows down the test considerably as it is done for every aggregate call.
|
See also #3578 |
|
Ah, the docs for HLL say its thread safe so i was wondering if that might cause issues there. |
|
Well, it's definitely sketchy to be calling aggregate and get concurrently. HyperLogLogCollector isn't thread safe. It's possible that you'll get bizarre values from time to time, like if the offset of an HLLC is in the process of being incremented in one thread while it's being read in another thread. So I think this PR has value. |
leventov
left a comment
There was a problem hiding this comment.
Synchronization should be done inside aggregators, because simple aggregators could use cheaper atomics instead of intrinsic locks. If in some use cases aggregators don't need synchronization at all, we can add methods like doAggregateConcurrent() and getConcurrent().
|
Also if there is just one writing thread, synchronization inside simple aggregators (long/double/float) is not needed at all. |
|
I was just looking at this issue again after the conversations on the mailing list about sketch synchronization: https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E I was wondering, does it make more sense for thread-safety here to be handled systematically (at the IncrementalIndex) or for each aggregator to be thread safe? Currently we do different approaches: the sketch aggregators endeavor to be thread-safe on their own. The primitive aggregators don't bother to even try, and they're probably fine, since they're primitives. HyperLogLogAggregator tries a little bit -- it at least makes sure the different calls use different buffer objects -- but I bet it has a bug where "get" could potentially read something weird and corrupt in some rare situations. (Like if the offset is being updated while a "get" is going on.) |
|
Dealing with this issue systematically means taking the most conservative approach - synchronization, while some aggregators could definitely do better (lock-free) |
I agree. This is what we hope to do now in sketches-core, add concurrent (thread-safe) sketches that use lightweight synchronization. The first step is adding concurrent theta sketch, which can be followed by a concurrent union implementation. Later additional concurrent sketches can be added to the library (we already have an implementation of a concurrent quantile sketch). |
|
In general, I agree with @leventov here because different aggregators can handle concurrency with varying degree of efficiency. That said, we need synchronization only for realtime indexing code path and historical nodes pay the penalty of thread safety unnecessarily. If we could do something systematic to change the two code paths in some way that allows historicals not paying for thread safety, that would be good. |
|
The The idea would be to avoid using synchronization for cases where it's not necessary, like usage of aggregators by query engines. |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
This pull request/issue is no longer marked as stale. |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
|
This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Aggregators are NOT thread safe and if two threads concurrently try
to read/write to the aggregator the reader may read absurd values since
the aggregate method is not atomic.
In case of IncrementalIndex the writes are protected by a sync block
but the reads are unprotected, so its possible for the queries to read
absurd values in aggregator.get().
This PR adds a test that can reproduce that behavior by wrapping
Aggregators inside a ThreadSafetyAssertionAggregator.
TODO: test any performance impacts.