Long key bucket ords key iterator by not-napoleon · Pull Request #95809 · elastic/elasticsearch

not-napoleon · 2023-05-03T20:23:36Z

Relates to #89437

In order to merge aggregations using LongKeyedBucketOrds, we need a way to align the buckets by key. This adds an iterator to return all keys for a given owning bucket ordinal, in natural sorted order. This can then be used to merge without generating and sorting bucket objects directly.

This makes two big assumptions:

1 - the total set of keys is small enough that we don't need to put the key set in a Big Arrays backed data structure
2 - Building the tree set of keys (at reduce time) will not be excessively expensive.

We can mitigate 1 if we build a big arrays backed balanced binary tree. I didn't want to do that if I don't have to.

elasticsearchmachine · 2023-05-03T20:24:00Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

not-napoleon · 2023-05-03T20:25:30Z

...er/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/LongKeyedBucketOrds.java


+    public Iterator<Long> keyOrderedIterator(long owningBucketOrd) {
+        if (keySet == null) {
+            keySet = new TreeSet<>();


We could also build this as we're adding values, but since add is very hot, it seemed safer to do it out of that loop.

We also don't always call keyOrderedIterator.

nik9000

The approach feels fine to me. I recommended an array and a different iterator, but otherwise I'm good.

nik9000 · 2023-05-03T20:28:49Z

...er/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/LongKeyedBucketOrds.java

     */
    public abstract BucketOrdsEnum ordsEnum(long owningBucketOrd);

+    public Iterator<Long> keyOrderedIterator(long owningBucketOrd) {


I wonder if PrimitiveIterator.OfLong is better. I think it's about the same to implement it.

And could you add javadocs? I am still trying to digest what is going on here :)

yeah, sorry, I should have marked this "draft". Was really just pushing it for some early feedback and CI testing. I'll add java doc today.

nik9000 · 2023-05-03T20:30:57Z

...er/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/LongKeyedBucketOrds.java

        });
    }

+    private TreeSet<Long> keySet = null;


Once this is set you can't add any more things to it. It's probably worth asserting that.

I wonder if this is better as a long[] and then you sort it. That feels lighter than all of the tree stuff.

I thought about a long[], but we'd still need duplicate removal. Which is doable, but seemed like enough complexity to justify the tree set. Open to trying it that way though.

To be clear, we need duplicate removal at insert time, otherwise it ends up being just as big as the ords array, basically by definition

Maybe it would be good to nullify this variable during release?

iverase · 2023-05-04T12:44:36Z

I think this is fine, we might realise later that we need to optimise it / make it safer but as as POC for aggregation reduction looks good to me.

not-napoleon · 2023-05-10T19:19:19Z

@iverase - yeah, I spent basically zero time on optimizing this. I did open a couple of tickets for low hanging fruit optimizations: #95961 and #95960

There's definitely more we could do, but I don't think it's worth investing too much into it until we can see where the bottle necks on the new reduce logic are.

nik9000 · 2023-05-10T20:10:05Z

...er/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/LongKeyedBucketOrds.java


+    public Iterator<Long> keyOrderedIterator(long owningBucketOrd) {
+        if (keySet == null) {
+            keySet = new TreeSet<>();


We also don't always call keyOrderedIterator.

iverase

Thanks @not-napoleon!

not-napoleon added 2 commits May 3, 2023 10:41

moved some methods for readability

7d00f27

add an ordered keys iterator

143fa3a

not-napoleon added >non-issue :Analytics/Aggregations Aggregations v8.9.0 labels May 3, 2023

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 3, 2023

not-napoleon commented May 3, 2023

View reviewed changes

nik9000 reviewed May 3, 2023

View reviewed changes

spotless apply

1648606

not-napoleon added 2 commits May 4, 2023 09:24

javadoc etc

9aea61d

many but small test

3d3367b

This was referenced May 9, 2023

Big Array Backed Balanced Binary Trees #95960

Open

Use Primitive Iterators for key set iterators #95961

Open

Enable Circuit Breaker tracking in more parts of the aggregations framework #89437

Open

not-napoleon added 2 commits May 9, 2023 12:59

spotless apply

bd2ea86

Fixed major testing blunder. Javadoc

9c00c28

not-napoleon requested review from iverase and nik9000 May 10, 2023 19:15

nik9000 approved these changes May 10, 2023

View reviewed changes

iverase approved these changes May 10, 2023

View reviewed changes

not-napoleon merged commit 4f4614a into elastic:main May 15, 2023

not-napoleon deleted the long-key-bucket-ords-key-iterator branch May 15, 2023 13:35

Conversation

not-napoleon commented May 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented May 3, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iverase commented May 4, 2023

Uh oh!

not-napoleon commented May 10, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iverase left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

not-napoleon commented May 3, 2023 •

edited

Loading