Long key bucket ords key iterator#95809
Conversation
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
|
||
| public Iterator<Long> keyOrderedIterator(long owningBucketOrd) { | ||
| if (keySet == null) { | ||
| keySet = new TreeSet<>(); |
There was a problem hiding this comment.
We could also build this as we're adding values, but since add is very hot, it seemed safer to do it out of that loop.
There was a problem hiding this comment.
We also don't always call keyOrderedIterator.
nik9000
left a comment
There was a problem hiding this comment.
The approach feels fine to me. I recommended an array and a different iterator, but otherwise I'm good.
| */ | ||
| public abstract BucketOrdsEnum ordsEnum(long owningBucketOrd); | ||
|
|
||
| public Iterator<Long> keyOrderedIterator(long owningBucketOrd) { |
There was a problem hiding this comment.
I wonder if PrimitiveIterator.OfLong is better. I think it's about the same to implement it.
There was a problem hiding this comment.
And could you add javadocs? I am still trying to digest what is going on here :)
There was a problem hiding this comment.
yeah, sorry, I should have marked this "draft". Was really just pushing it for some early feedback and CI testing. I'll add java doc today.
| }); | ||
| } | ||
|
|
||
| private TreeSet<Long> keySet = null; |
There was a problem hiding this comment.
Once this is set you can't add any more things to it. It's probably worth asserting that.
I wonder if this is better as a long[] and then you sort it. That feels lighter than all of the tree stuff.
There was a problem hiding this comment.
I thought about a long[], but we'd still need duplicate removal. Which is doable, but seemed like enough complexity to justify the tree set. Open to trying it that way though.
There was a problem hiding this comment.
To be clear, we need duplicate removal at insert time, otherwise it ends up being just as big as the ords array, basically by definition
There was a problem hiding this comment.
Maybe it would be good to nullify this variable during release?
|
I think this is fine, we might realise later that we need to optimise it / make it safer but as as POC for aggregation reduction looks good to me. |
|
@iverase - yeah, I spent basically zero time on optimizing this. I did open a couple of tickets for low hanging fruit optimizations: #95961 and #95960 There's definitely more we could do, but I don't think it's worth investing too much into it until we can see where the bottle necks on the new reduce logic are. |
|
|
||
| public Iterator<Long> keyOrderedIterator(long owningBucketOrd) { | ||
| if (keySet == null) { | ||
| keySet = new TreeSet<>(); |
There was a problem hiding this comment.
We also don't always call keyOrderedIterator.
Relates to #89437
In order to merge aggregations using
LongKeyedBucketOrds, we need a way to align the buckets by key. This adds an iterator to return all keys for a given owning bucket ordinal, in natural sorted order. This can then be used to merge without generating and sorting bucket objects directly.This makes two big assumptions:
1 - the total set of keys is small enough that we don't need to put the key set in a Big Arrays backed data structure
2 - Building the tree set of keys (at reduce time) will not be excessively expensive.
We can mitigate 1 if we build a big arrays backed balanced binary tree. I didn't want to do that if I don't have to.