Skip to content

[Backport] Workaround non-thread-safe use of HLL aggregators.#3583

Merged
fjy merged 1 commit intoapache:0.9.2from
gianm:backport-3578-to-0.9.2
Oct 17, 2016
Merged

[Backport] Workaround non-thread-safe use of HLL aggregators.#3583
fjy merged 1 commit intoapache:0.9.2from
gianm:backport-3578-to-0.9.2

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Oct 17, 2016

Backport of #3578 to 0.9.2.

Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in apache#3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before apache#3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
@gianm gianm added the Backport label Oct 17, 2016
@gianm gianm added this to the 0.9.2 milestone Oct 17, 2016
@drcrallen
Copy link
Contributor

👍 after travis

@fjy fjy merged commit a7ac59d into apache:0.9.2 Oct 17, 2016
@gianm gianm deleted the backport-3578-to-0.9.2 branch October 18, 2016 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants