Skip to content

Speed up aggs with sub-aggregations#69806

Merged
nik9000 merged 17 commits intoelastic:masterfrom
nik9000:filters_sub_ok
Mar 3, 2021
Merged

Speed up aggs with sub-aggregations#69806
nik9000 merged 17 commits intoelastic:masterfrom
nik9000:filters_sub_ok

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented Mar 2, 2021

This allows many of the optimizations added in #63643 and #68871 to run
on aggregations with sub-aggregations. This should:

  • Speed up terms aggregations on fields with less than 1000 values that
    also have sub-aggregations. Locally I see 1.5 second searches run in 1.2
    seconds.
  • Applies that same speedup to range and date_histogram aggregations but
    it feels less impressive because the point range queries are a little
    slower to get up and go.
  • Massively speed up filters aggregations with sub-aggregations that
    don't have a parent aggregation or collect "other" buckets. Also
    save a ton of memory while collecting them. I've seen reports of this
    being a 20x speed up. Even if filters is rare that's pretty darn good.

This allows many of the optimizations added in elastic#63643 and elastic#68871 to run
on aggregations with sub-aggregations. This should:
* Speed up `terms` aggregations on fields with less than 1000 values that
  also have sub-aggregations. Locally I see 2 second searches run in 1.2
  seconds.
* Applies that same speedup to `range` and `date_histogram` aggregations but
  it feels less impressive because the point range queries are a little
  slower to get up and go.
* Massively speed up `filters` aggregations with sub-aggregations that
  don't have a `parent` aggregation or collect "other" buckets. Also
  save a ton of memory while collecting them.
@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Mar 2, 2021

I've opened this up as a draft so I can use jenkins to run all the tests in parallel. My desktop is busy at the moment....

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Mar 2, 2021

| 90th percentile service time |                     keyword-terms | 1666.85   | 1555.73  | -111.116   | ms |
| 90th percentile service time |     keyword-terms-low-cardinality |   37.6112 |   38.016 |    0.40475 | ms |
| 90th percentile service time |                 keyword-terms-min | 2875.04   | 2934.54  |   59.493   | ms |
| 90th percentile service time | keyword-terms-low-cardinality-min | 1515.4    | 1151.45  | -363.957   | ms |

Those are the rally reslts for this change. The 24% speed up on
keyword-terms-low-cardinality-min is expacted. The speed up on
keyword-terms is not. Gremlins? Either way, this is
keyword-terms-low-cardinality-min:

POST weather-data-2016/_search
{
  "size": 0,
  "aggs": {
    "country": {
      "terms": {
        "field": "station.country",
        "size": 200
      },
      "aggs": {
        "tmin": {
          "min": {
            "field": "TMIN"
          }
        }
      }
    }
  }
}

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Mar 2, 2021

An interesting thing - this totally ignores collect_mode right now. If you get into the "less than 1000 distinct values" branch we'll use filters agg and just collect immediately. This is maybe ok because usually you'll want an appreciable portion of those 1000 distinct values anyway. But it is probably worth revisiting collect_mode in light of this.

);
if (false == filterByFilter.scoreMode().needsScores()) {
// Filter by filter won't produce the correct results if the sub-aggregators need scores
return filterByFilter;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is pretty sad! But it isn't worth trying to figure this stuff out now. Maybe not never. I wish it weren't so janky to stop it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a little error prone to have to check this on the caller's side, but I don't see a better option.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that scares me quite a bit. Can you add a comment explaining that this information is not readily available until we actually build aggregations. I feel like factories should have enough information to actually know this stuff.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They might be able to tell, but it'd be a ton of plumbing. I looked at it and though "I think it'd be easier to make score work here". But I didn't want to do either one right now..... I'll add the comment!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I looked at it as well, and it didn't look practical to fix right now. My comment was more like a mental note to add this to the wish list if we ever get to refactor them.

@nik9000 nik9000 requested review from imotov and not-napoleon March 3, 2021 02:33
@nik9000 nik9000 marked this pull request as ready for review March 3, 2021 02:34
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 3, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Mar 3, 2021

I think I have the tests sorted. This brings a 24% speed up when a low cardinality terms aggregation is followed by a simple metric aggregation. And it makes it possible for us to enhance things like terms -> terms -> metrics and date_histogram -> terms -> metrics which are both fairly common.

Copy link
Copy Markdown
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

);
if (false == filterByFilter.scoreMode().needsScores()) {
// Filter by filter won't produce the correct results if the sub-aggregators need scores
return filterByFilter;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a little error prone to have to check this on the caller's side, but I don't see a better option.

// There aren't any matches for this filter in this leaf
return 0;
}
return scorer.cost(); // TODO in another PR (please) change this to ScorerSupplier.cost
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we lose this TODO?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thouhgt I just removed the "in another PR" part. Let me fix.

Copy link
Copy Markdown
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

);
if (false == filterByFilter.scoreMode().needsScores()) {
// Filter by filter won't produce the correct results if the sub-aggregators need scores
return filterByFilter;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that scares me quite a bit. Can you add a comment explaining that this information is not readily available until we actually build aggregations. I feel like factories should have enough information to actually know this stuff.

@nik9000 nik9000 merged commit 10e2f90 into elastic:master Mar 3, 2021
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 3, 2021
This allows many of the optimizations added in elastic#63643 and elastic#68871 to run
on aggregations with sub-aggregations. This should:
* Speed up `terms` aggregations on fields with less than 1000 values that
  also have sub-aggregations. Locally I see 2 second searches run in 1.2
  seconds.
* Applies that same speedup to `range` and `date_histogram` aggregations but
  it feels less impressive because the point range queries are a little
  slower to get up and go.
* Massively speed up `filters` aggregations with sub-aggregations that
  don't have a `parent` aggregation or collect "other" buckets. Also
  save a ton of memory while collecting them.
nik9000 added a commit that referenced this pull request Mar 5, 2021
This allows many of the optimizations added in #63643 and #68871 to run
on aggregations with sub-aggregations. This should:
* Speed up `terms` aggregations on fields with less than 1000 values that
  also have sub-aggregations. Locally I see 2 second searches run in 1.2
  seconds.
* Applies that same speedup to `range` and `date_histogram` aggregations but
  it feels less impressive because the point range queries are a little
  slower to get up and go.
* Massively speed up `filters` aggregations with sub-aggregations that
  don't have a `parent` aggregation or collect "other" buckets. Also
  save a ton of memory while collecting them.
@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Mar 5, 2021

OK! Backport in. Time to update skips and such in master.

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 9, 2021
Now that we've backported elastic#69806 we can test it in the bwc tests.
nik9000 added a commit that referenced this pull request Mar 9, 2021
Now that we've backported #69806 we can test it in the bwc tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.13.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants