HLL: Avoid some allocations when possible. by gianm · Pull Request #3314 · apache/druid

gianm · 2016-08-03T01:15:40Z

HLLC.fold avoids duplicating the other buffer by saving and restoring its position.
HLLC.makeCollector(buffer) no longer duplicates incoming BBs.
Updated call sites where appropriate to duplicate BBs passed to HLLC.

Inspired by @navis's comment in #3111 (comment) (although I didn't go as far 😄 ).

Benchmarks, jdk1.8.0_60 (19–30% speedup depending on the test):

topN - master + sorting on hyperUniquesMet

Benchmark                                  (numSegments)  (rowsPerSegment)        (schemaAndQuery)  (threshold)  Mode  Cnt        Score       Error  Units
TopNBenchmark.queryMultiQueryableIndex                 1            750000                 basic.A           10  avgt   25   193478.532 ±  5680.045  us/op
TopNBenchmark.querySingleIncrementalIndex              1            750000                 basic.A           10  avgt   25  1571360.141 ± 37579.656  us/op
TopNBenchmark.querySingleQueryableIndex                1            750000                 basic.A           10  avgt   25   192961.326 ±  5667.723  us/op

topN - patch + sorting on hyperUniquesMet 

Benchmark                                  (numSegments)  (rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt        Score       Error  Units
TopNBenchmark.queryMultiQueryableIndex                 1            750000           basic.A           10  avgt   25   158187.083 ±  4630.679  us/op
TopNBenchmark.querySingleIncrementalIndex              1            750000           basic.A           10  avgt   25  1082888.189 ± 33717.391  us/op
TopNBenchmark.querySingleQueryableIndex                1            750000           basic.A           10  avgt   25   155275.814 ±  4052.757  us/op

fjy · 2016-08-03T01:29:36Z

👍

@navis do you mind reviewing this one as well?

drcrallen · 2016-08-03T14:48:28Z

...essing/src/main/java/io/druid/query/aggregation/cardinality/CardinalityBufferAggregator.java

This shouldn't be allocating a new byte buffer. duplicate() simply makes a new buffer view into the underlying data. Are you sure this one affects performance?

Actually, I believe it. Even for "small" allocations, since this is in aggregate it is going to be called again and again and again.

I didn't test Cardinality, but I did test HyperUnique, and the performance boost from avoiding the duplicate() of the left-hand side is substantial (a significant chunk of the overall ~19% boost).

btw, by "allocating a new ByteBuffer" I really meant "allocating a new ByteBuffer object pointing to the existing data". The ByteBuffer object has substantial overhead (8 fields!) and apparently the JVM has trouble optimizing that even for short lived allocations.

fwiw- I also tried adjusting things such that the HyperLogLogCollector was re-used too, but saw zero performance boost from that. I thought that was cool, and figured that was because it only has 3 fields, which made it easier for the JVM to optimize.

Can you modify the comment to make a clearer distinction between bytebuffer object allocation and the buffer allocation itself?

yes, definitely.

Updated the comment to say "ByteBuffer object" instead of "ByteBuffer"

gianm · 2016-08-03T18:30:16Z

@drcrallen I updated the makeCollector contract to say it's allowed to modify the position/limit of the incoming buffer, and updated call sites accordingly

drcrallen · 2016-08-03T18:36:25Z

processing/src/main/java/io/druid/query/aggregation/hyperloglog/HyperLogLogCollector.java

drcrallen · 2016-08-03T18:36:55Z

Minor doc comment at https://github.com/druid-io/druid/pull/3314/files#r73394157 but 👍 after doc change and travis

- HLLC.fold avoids duplicating the other buffer by saving and restoring its position. - HLLC.makeCollector(buffer) no longer duplicates incoming BBs. - Updated call sites where appropriate to duplicate BBs passed to HLLC.

navis · 2016-08-04T00:42:40Z

@gianm Looks good. I think I should have dig in more on that.
Just one thing : isn't it possible to lessen overhead of "swap" part in fold() operation in HyperLogLogCollector?

gianm · 2016-08-04T00:44:59Z

@navis possibly, to be honest I did not look that closely for more possible opportunities. I just made the most obvious modifications.

drcrallen · 2016-08-04T00:53:18Z

I support small incremental improvement/changes.

navis · 2016-08-04T01:05:09Z

ok, I think I can do that after this. 👍 from me.

Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before apache#3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.

Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before #3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.

Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before apache#3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.

Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before #3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.

Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before apache#3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.

gianm added the Performance label Aug 3, 2016

gianm added this to the 0.9.2 milestone Aug 3, 2016

drcrallen reviewed Aug 3, 2016
View reviewed changes

gianm force-pushed the hll-chill-out branch 2 times, most recently from 066ae07 to 75bbc89 Compare August 3, 2016 18:29

drcrallen reviewed Aug 3, 2016
View reviewed changes

HLL: Avoid some allocations when possible.

41a2a26

- HLLC.fold avoids duplicating the other buffer by saving and restoring its position. - HLLC.makeCollector(buffer) no longer duplicates incoming BBs. - Updated call sites where appropriate to duplicate BBs passed to HLLC.

gianm force-pushed the hll-chill-out branch from 75bbc89 to 41a2a26 Compare August 3, 2016 18:37

drcrallen merged commit 9437a7a into apache:master Aug 4, 2016

navis mentioned this pull request Aug 4, 2016

HLL: More avoiding allocations #3321

Closed

gianm deleted the hll-chill-out branch August 4, 2016 17:15

gianm mentioned this pull request Oct 11, 2016

BufferUnderflowException in HyperLogLogCollector.fold #3560

Closed

gianm mentioned this pull request Oct 15, 2016

Workaround non-thread-safe use of HLL aggregators. #3578

Merged

gianm mentioned this pull request Dec 4, 2016

Druid 0.9.2 release notes #3503

Closed

seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020

Backport of apache#3314 (avoid allocation if possible for HLL)

0a0ffc9

Conversation

gianm commented Aug 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjy commented Aug 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm Aug 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm commented Aug 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drcrallen commented Aug 3, 2016

Uh oh!

navis commented Aug 4, 2016

Uh oh!

gianm commented Aug 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drcrallen commented Aug 4, 2016

Uh oh!

navis commented Aug 4, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gianm commented Aug 3, 2016 •

edited

Loading

gianm Aug 3, 2016 •

edited

Loading

gianm commented Aug 4, 2016 •

edited

Loading