Skip to content

Workaround non-thread-safe use of HLL aggregators.#3578

Merged
fjy merged 1 commit intoapache:masterfrom
gianm:fix-3560
Oct 17, 2016
Merged

Workaround non-thread-safe use of HLL aggregators.#3578
fjy merged 1 commit intoapache:masterfrom
gianm:fix-3560

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Oct 15, 2016

Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before #3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.

Fixes #3560.

Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before apache#3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
@gianm gianm added the Bug label Oct 15, 2016
@gianm gianm added this to the 0.9.2 milestone Oct 15, 2016
@gianm gianm closed this Oct 15, 2016
@gianm gianm reopened this Oct 15, 2016
{
return collector;
// Workaround for OnheapIncrementalIndex's penchant for calling "aggregate" and "get" simultaneously.
return HyperLogLogCollector.makeCollector(collector.getStorageBuffer().duplicate());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HLL is already super expensive to calculate, can you get some benchmarks on how this effects both ingestion and querying?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; do you have any feedback other than that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen posted benchmarks; looks pretty similar before and after. This makes sense to me since get is not called very often compared to aggregate.

@gianm
Copy link
Contributor Author

gianm commented Oct 15, 2016

Query benchmarks, including a benchmark with an extraction dim spec to force the DimExtractionTopNAlgorithm (which uses Aggregator rather than BufferAggregator):

fix-3560: topN on HLL

Benchmark                                  (numSegments)  (rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt        Score       Error  Units
TopNBenchmark.queryMultiQueryableIndex                 1            750000           basic.A           10  avgt   25   118507.045 ±  4529.905  us/op
TopNBenchmark.querySingleIncrementalIndex              1            750000           basic.A           10  avgt   25  1045553.387 ± 49598.109  us/op
TopNBenchmark.querySingleQueryableIndex                1            750000           basic.A           10  avgt   25   103220.257 ±  4797.120  us/op

fix-3560: topN on HLL + extraction dimension

Benchmark                                  (numSegments)  (rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt        Score       Error  Units
TopNBenchmark.queryMultiQueryableIndex                 1            750000           basic.A           10  avgt   25   104743.297 ±  4584.774  us/op
TopNBenchmark.querySingleIncrementalIndex              1            750000           basic.A           10  avgt   25  1051282.036 ± 42161.144  us/op
TopNBenchmark.querySingleQueryableIndex                1            750000           basic.A           10  avgt   25   119610.054 ±  4121.940  us/op

0.9.2: topN on HLL

Benchmark                                  (numSegments)  (rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt        Score       Error  Units
TopNBenchmark.queryMultiQueryableIndex                 1            750000           basic.A           10  avgt   25   115235.941 ±  5724.145  us/op
TopNBenchmark.querySingleIncrementalIndex              1            750000           basic.A           10  avgt   25  1039008.121 ± 41863.208  us/op
TopNBenchmark.querySingleQueryableIndex                1            750000           basic.A           10  avgt   25   113842.340 ±  4576.450  us/op

0.9.2: topN on HLL + extraction dimension

Benchmark                                  (numSegments)  (rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt        Score       Error  Units
TopNBenchmark.queryMultiQueryableIndex                 1            750000           basic.A           10  avgt   25   114115.535 ± 16587.212  us/op
TopNBenchmark.querySingleIncrementalIndex              1            750000           basic.A           10  avgt   25  1047492.787 ± 43123.218  us/op
TopNBenchmark.querySingleQueryableIndex                1            750000           basic.A           10  avgt   25   114383.215 ±  4976.573  us/op

@gianm
Copy link
Contributor Author

gianm commented Oct 15, 2016

Index persist benchmarks:

fix-3560: 

Benchmark                        (rollup)  (rowsPerSegment)  (schema)  Mode  Cnt        Score       Error  Units
IndexPersistBenchmark.persistV9      true             75000     basic  avgt   25  1083560.470 ± 41714.967  us/op
IndexPersistBenchmark.persistV9     false             75000     basic  avgt   25  1118639.035 ± 39750.698  us/op

0.9.2:

Benchmark                        (rollup)  (rowsPerSegment)  (schema)  Mode  Cnt        Score       Error  Units
IndexPersistBenchmark.persistV9      true             75000     basic  avgt   25  1097836.528 ± 43535.585  us/op
IndexPersistBenchmark.persistV9     false             75000     basic  avgt   25  1161545.577 ± 36522.730  us/op

I don't think other indexing benchmarks are needed since get is only called during persisting.

@fjy fjy closed this Oct 17, 2016
@fjy fjy reopened this Oct 17, 2016
@fjy
Copy link
Contributor

fjy commented Oct 17, 2016

👍

@drcrallen
Copy link
Contributor

👍

@drcrallen
Copy link
Contributor

Waiting for travis

@fjy fjy merged commit 285516b into apache:master Oct 17, 2016
gianm added a commit to gianm/druid that referenced this pull request Oct 17, 2016
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before apache#3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
fjy pushed a commit that referenced this pull request Oct 17, 2016
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before #3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
fundead pushed a commit to fundead/druid that referenced this pull request Dec 7, 2016
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before apache#3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Feb 25, 2022
@gianm gianm deleted the fix-3560 branch September 23, 2022 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants