Save memory when string terms are not on top#57758
Conversation
This reworks string flavored implementations of the `terms` aggregation to save memory when it is under another bucket by dropping the usage of `asMultiBucketAggregator`.
|
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
|
In my tests this makes the |
|
I'm putting together a lower cardinality test case to see what that does. |
|
When the terms aggregation is inside another aggregation and cardinality is high (~5 million buckets collected), this is flat for performance for global ords and marginally better for the "map" strategy (before after). When the cardinality is lower (~30 thousand buckets collected) this is about 15% faster for global ords (before after). I expect this is because It uses much less memory for the "map" strategy when the terms agg is a child of a high cardinality agg because the bucket collection strategy can share the map of terms. @polyfractal had asked me about the performance difference for At some point we could indeed investigate using those strategies with the |
| return collectsFromSingleBucket ? new FromSingle(bigArrays) : new FromMany(bigArrays); | ||
| } | ||
|
|
||
| private BytesKeyedBucketOrds() {} |
There was a problem hiding this comment.
Oh, that reminds me, I should put a private default constructor on ValuesSourceConfig, thanks.
| this.collectionStrategy = new RemapGlobalOrds(collectsFromSingleBucket); | ||
| } else { | ||
| // Dense ords don't know how to collect from many buckets | ||
| assert collectsFromSingleBucket; |
There was a problem hiding this comment.
How confidant are we that this assert will never trip? I'm wondering if it makes sense to change the condition to read if (remapGlobalOrds || collectFromSingleBucket==false)
There was a problem hiding this comment.
I'm pretty confident. But I'll switch it to a hard check just for extra assurance.
|
run elasticsearch-ci/default-distro |
This reworks string flavored implementations of the `terms` aggregation to save memory when it is under another bucket by dropping the usage of `asMultiBucketAggregator`.
This reworks string flavored implementations of the
termsaggregationto save memory when it is under another bucket by dropping the usage of
asMultiBucketAggregator.