Speed up aggs with sub-aggregations by nik9000 · Pull Request #69806 · elastic/elasticsearch

nik9000 · 2021-03-02T14:06:54Z

This allows many of the optimizations added in #63643 and #68871 to run
on aggregations with sub-aggregations. This should:

Speed up terms aggregations on fields with less than 1000 values that
also have sub-aggregations. Locally I see 1.5 second searches run in 1.2
seconds.
Applies that same speedup to range and date_histogram aggregations but
it feels less impressive because the point range queries are a little
slower to get up and go.
Massively speed up filters aggregations with sub-aggregations that
don't have a parent aggregation or collect "other" buckets. Also
save a ton of memory while collecting them. I've seen reports of this
being a 20x speed up. Even if filters is rare that's pretty darn good.

This allows many of the optimizations added in elastic#63643 and elastic#68871 to run on aggregations with sub-aggregations. This should: * Speed up `terms` aggregations on fields with less than 1000 values that also have sub-aggregations. Locally I see 2 second searches run in 1.2 seconds. * Applies that same speedup to `range` and `date_histogram` aggregations but it feels less impressive because the point range queries are a little slower to get up and go. * Massively speed up `filters` aggregations with sub-aggregations that don't have a `parent` aggregation or collect "other" buckets. Also save a ton of memory while collecting them.

nik9000 · 2021-03-02T14:37:31Z

I've opened this up as a draft so I can use jenkins to run all the tests in parallel. My desktop is busy at the moment....

nik9000 · 2021-03-02T15:03:56Z

| 90th percentile service time |                     keyword-terms | 1666.85   | 1555.73  | -111.116   | ms |
| 90th percentile service time |     keyword-terms-low-cardinality |   37.6112 |   38.016 |    0.40475 | ms |
| 90th percentile service time |                 keyword-terms-min | 2875.04   | 2934.54  |   59.493   | ms |
| 90th percentile service time | keyword-terms-low-cardinality-min | 1515.4    | 1151.45  | -363.957   | ms |

Those are the rally reslts for this change. The 24% speed up on
keyword-terms-low-cardinality-min is expacted. The speed up on
keyword-terms is not. Gremlins? Either way, this is
keyword-terms-low-cardinality-min:

POST weather-data-2016/_search
{
  "size": 0,
  "aggs": {
    "country": {
      "terms": {
        "field": "station.country",
        "size": 200
      },
      "aggs": {
        "tmin": {
          "min": {
            "field": "TMIN"
          }
        }
      }
    }
  }
}

nik9000 · 2021-03-02T15:40:09Z

An interesting thing - this totally ignores collect_mode right now. If you get into the "less than 1000 distinct values" branch we'll use filters agg and just collect immediately. This is maybe ok because usually you'll want an appreciable portion of those 1000 distinct values anyway. But it is probably worth revisiting collect_mode in light of this.

nik9000 · 2021-03-02T18:19:24Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

+            );
+            if (false == filterByFilter.scoreMode().needsScores()) {
+                // Filter by filter won't produce the correct results if the sub-aggregators need scores
+                return filterByFilter;


This bit is pretty sad! But it isn't worth trying to figure this stuff out now. Maybe not never. I wish it weren't so janky to stop it.

Seems a little error prone to have to check this on the caller's side, but I don't see a better option.

Yeah, that scares me quite a bit. Can you add a comment explaining that this information is not readily available until we actually build aggregations. I feel like factories should have enough information to actually know this stuff.

They might be able to tell, but it'd be a ton of plumbing. I looked at it and though "I think it'd be easier to make score work here". But I didn't want to do either one right now..... I'll add the comment!

Yeah, I looked at it as well, and it didn't look practical to fix right now. My comment was more like a mental note to add this to the wish list if we ever get to refactor them.

elasticmachine · 2021-03-03T02:34:24Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

nik9000 · 2021-03-03T02:37:14Z

I think I have the tests sorted. This brings a 24% speed up when a low cardinality terms aggregation is followed by a simple metric aggregation. And it makes it possible for us to enhance things like terms -> terms -> metrics and date_histogram -> terms -> metrics which are both fairly common.

not-napoleon

LGTM

not-napoleon · 2021-03-03T15:37:30Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

+            );
+            if (false == filterByFilter.scoreMode().needsScores()) {
+                // Filter by filter won't produce the correct results if the sub-aggregators need scores
+                return filterByFilter;


Seems a little error prone to have to check this on the caller's side, but I don't see a better option.

not-napoleon · 2021-03-03T18:08:30Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java

            // There aren't any matches for this filter in this leaf
            return 0;
        }
-        return scorer.cost();   // TODO in another PR (please) change this to ScorerSupplier.cost


Why did we lose this TODO?

I thouhgt I just removed the "in another PR" part. Let me fix.

imotov

LGTM

imotov · 2021-03-03T20:24:14Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

+            );
+            if (false == filterByFilter.scoreMode().needsScores()) {
+                // Filter by filter won't produce the correct results if the sub-aggregators need scores
+                return filterByFilter;


Yeah, that scares me quite a bit. Can you add a comment explaining that this information is not readily available until we actually build aggregations. I feel like factories should have enough information to actually know this stuff.

This allows many of the optimizations added in elastic#63643 and elastic#68871 to run on aggregations with sub-aggregations. This should: * Speed up `terms` aggregations on fields with less than 1000 values that also have sub-aggregations. Locally I see 2 second searches run in 1.2 seconds. * Applies that same speedup to `range` and `date_histogram` aggregations but it feels less impressive because the point range queries are a little slower to get up and go. * Massively speed up `filters` aggregations with sub-aggregations that don't have a `parent` aggregation or collect "other" buckets. Also save a ton of memory while collecting them.

This allows many of the optimizations added in #63643 and #68871 to run on aggregations with sub-aggregations. This should: * Speed up `terms` aggregations on fields with less than 1000 values that also have sub-aggregations. Locally I see 2 second searches run in 1.2 seconds. * Applies that same speedup to `range` and `date_histogram` aggregations but it feels less impressive because the point range queries are a little slower to get up and go. * Massively speed up `filters` aggregations with sub-aggregations that don't have a `parent` aggregation or collect "other" buckets. Also save a ton of memory while collecting them.

nik9000 · 2021-03-05T14:13:44Z

OK! Backport in. Time to update skips and such in master.

Now that we've backported elastic#69806 we can test it in the bwc tests.

Now that we've backported #69806 we can test it in the bwc tests.

Fixup test

e84439d

nik9000 added 6 commits March 2, 2021 10:52

Force to global ords

ff3f132

Checkstyle

4cddbb5

Update rollup test

7e63e88

Merge branch 'master' into filters_sub_ok

0d2c2f3

Sadness

4133d7e

No commit me

6da518d

nik9000 commented Mar 2, 2021

View reviewed changes

nik9000 added 5 commits March 2, 2021 13:42

Moar tests

c4fdfac

Another

95e1e91

We can do nothing?

f4c4ced

More tests

6e49729

Runtime fields too

75411a7

nik9000 requested review from imotov and not-napoleon March 3, 2021 02:33

nik9000 added :Analytics/Aggregations Aggregations v7.13.0 v8.0.0 >enhancement labels Mar 3, 2021

nik9000 marked this pull request as ready for review March 3, 2021 02:34

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 3, 2021

Skip

fc6ff0b

nik9000 added 2 commits March 3, 2021 09:10

One more

6b8fb69

Merge branch 'master' into filters_sub_ok

2a623ad

not-napoleon approved these changes Mar 3, 2021

View reviewed changes

imotov approved these changes Mar 3, 2021

View reviewed changes

Comments

78284ab

nik9000 merged commit 10e2f90 into elastic:master Mar 3, 2021

nik9000 added the backport pending label Mar 3, 2021

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 9, 2021

Update skip after backport elastic#69806

5982c43

Now that we've backported elastic#69806 we can test it in the bwc tests.

nik9000 mentioned this pull request Mar 9, 2021

Update skip after backport #69806 #70153

Merged

nik9000 removed the backport pending label Mar 9, 2021

nik9000 added a commit that referenced this pull request Mar 9, 2021

Update skip after backport of #69806 (#70153)

f3680b4

Now that we've backported #69806 we can test it in the bwc tests.

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

nik9000 commented Mar 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nik9000 commented Mar 2, 2021

Uh oh!

nik9000 commented Mar 2, 2021

Uh oh!

nik9000 commented Mar 2, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Mar 3, 2021

Uh oh!

nik9000 commented Mar 3, 2021

Uh oh!

not-napoleon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imotov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Mar 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nik9000 commented Mar 2, 2021 •

edited

Loading