HLL: Avoid some allocations when possible.#3314
Conversation
|
👍 @navis do you mind reviewing this one as well? |
There was a problem hiding this comment.
This shouldn't be allocating a new byte buffer. duplicate() simply makes a new buffer view into the underlying data. Are you sure this one affects performance?
There was a problem hiding this comment.
Actually, I believe it. Even for "small" allocations, since this is in aggregate it is going to be called again and again and again.
There was a problem hiding this comment.
I didn't test Cardinality, but I did test HyperUnique, and the performance boost from avoiding the duplicate() of the left-hand side is substantial (a significant chunk of the overall ~19% boost).
There was a problem hiding this comment.
btw, by "allocating a new ByteBuffer" I really meant "allocating a new ByteBuffer object pointing to the existing data". The ByteBuffer object has substantial overhead (8 fields!) and apparently the JVM has trouble optimizing that even for short lived allocations.
fwiw- I also tried adjusting things such that the HyperLogLogCollector was re-used too, but saw zero performance boost from that. I thought that was cool, and figured that was because it only has 3 fields, which made it easier for the JVM to optimize.
There was a problem hiding this comment.
Can you modify the comment to make a clearer distinction between bytebuffer object allocation and the buffer allocation itself?
There was a problem hiding this comment.
Updated the comment to say "ByteBuffer object" instead of "ByteBuffer"
066ae07 to
75bbc89
Compare
|
@drcrallen I updated the |
|
Minor doc comment at https://github.com/druid-io/druid/pull/3314/files#r73394157 but 👍 after doc change and travis |
- HLLC.fold avoids duplicating the other buffer by saving and restoring its position. - HLLC.makeCollector(buffer) no longer duplicates incoming BBs. - Updated call sites where appropriate to duplicate BBs passed to HLLC.
|
@gianm Looks good. I think I should have dig in more on that. |
|
@navis possibly, to be honest I did not look that closely for more possible opportunities. I just made the most obvious modifications. |
|
I support small incremental improvement/changes. |
|
ok, I think I can do that after this. 👍 from me. |
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before apache#3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before #3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before apache#3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before #3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before apache#3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.
Inspired by @navis's comment in #3111 (comment) (although I didn't go as far 😄 ).
Benchmarks, jdk1.8.0_60 (19–30% speedup depending on the test):