Speed up aggs with sub-aggregations#69806
Conversation
This allows many of the optimizations added in elastic#63643 and elastic#68871 to run on aggregations with sub-aggregations. This should: * Speed up `terms` aggregations on fields with less than 1000 values that also have sub-aggregations. Locally I see 2 second searches run in 1.2 seconds. * Applies that same speedup to `range` and `date_histogram` aggregations but it feels less impressive because the point range queries are a little slower to get up and go. * Massively speed up `filters` aggregations with sub-aggregations that don't have a `parent` aggregation or collect "other" buckets. Also save a ton of memory while collecting them.
|
I've opened this up as a draft so I can use jenkins to run all the tests in parallel. My desktop is busy at the moment.... |
Those are the rally reslts for this change. The 24% speed up on |
|
An interesting thing - this totally ignores |
| ); | ||
| if (false == filterByFilter.scoreMode().needsScores()) { | ||
| // Filter by filter won't produce the correct results if the sub-aggregators need scores | ||
| return filterByFilter; |
There was a problem hiding this comment.
This bit is pretty sad! But it isn't worth trying to figure this stuff out now. Maybe not never. I wish it weren't so janky to stop it.
There was a problem hiding this comment.
Seems a little error prone to have to check this on the caller's side, but I don't see a better option.
There was a problem hiding this comment.
Yeah, that scares me quite a bit. Can you add a comment explaining that this information is not readily available until we actually build aggregations. I feel like factories should have enough information to actually know this stuff.
There was a problem hiding this comment.
They might be able to tell, but it'd be a ton of plumbing. I looked at it and though "I think it'd be easier to make score work here". But I didn't want to do either one right now..... I'll add the comment!
There was a problem hiding this comment.
Yeah, I looked at it as well, and it didn't look practical to fix right now. My comment was more like a mental note to add this to the wish list if we ever get to refactor them.
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
I think I have the tests sorted. This brings a 24% speed up when a low cardinality |
| ); | ||
| if (false == filterByFilter.scoreMode().needsScores()) { | ||
| // Filter by filter won't produce the correct results if the sub-aggregators need scores | ||
| return filterByFilter; |
There was a problem hiding this comment.
Seems a little error prone to have to check this on the caller's side, but I don't see a better option.
| // There aren't any matches for this filter in this leaf | ||
| return 0; | ||
| } | ||
| return scorer.cost(); // TODO in another PR (please) change this to ScorerSupplier.cost |
There was a problem hiding this comment.
I thouhgt I just removed the "in another PR" part. Let me fix.
| ); | ||
| if (false == filterByFilter.scoreMode().needsScores()) { | ||
| // Filter by filter won't produce the correct results if the sub-aggregators need scores | ||
| return filterByFilter; |
There was a problem hiding this comment.
Yeah, that scares me quite a bit. Can you add a comment explaining that this information is not readily available until we actually build aggregations. I feel like factories should have enough information to actually know this stuff.
This allows many of the optimizations added in elastic#63643 and elastic#68871 to run on aggregations with sub-aggregations. This should: * Speed up `terms` aggregations on fields with less than 1000 values that also have sub-aggregations. Locally I see 2 second searches run in 1.2 seconds. * Applies that same speedup to `range` and `date_histogram` aggregations but it feels less impressive because the point range queries are a little slower to get up and go. * Massively speed up `filters` aggregations with sub-aggregations that don't have a `parent` aggregation or collect "other" buckets. Also save a ton of memory while collecting them.
This allows many of the optimizations added in #63643 and #68871 to run on aggregations with sub-aggregations. This should: * Speed up `terms` aggregations on fields with less than 1000 values that also have sub-aggregations. Locally I see 2 second searches run in 1.2 seconds. * Applies that same speedup to `range` and `date_histogram` aggregations but it feels less impressive because the point range queries are a little slower to get up and go. * Massively speed up `filters` aggregations with sub-aggregations that don't have a `parent` aggregation or collect "other" buckets. Also save a ton of memory while collecting them.
|
OK! Backport in. Time to update skips and such in master. |
Now that we've backported elastic#69806 we can test it in the bwc tests.
This allows many of the optimizations added in #63643 and #68871 to run
on aggregations with sub-aggregations. This should:
termsaggregations on fields with less than 1000 values thatalso have sub-aggregations. Locally I see 1.5 second searches run in 1.2
seconds.
rangeanddate_histogramaggregations butit feels less impressive because the point range queries are a little
slower to get up and go.
filtersaggregations with sub-aggregations thatdon't have a
parentaggregation or collect "other" buckets. Alsosave a ton of memory while collecting them. I've seen reports of this
being a 20x speed up. Even if
filtersis rare that's pretty darn good.