[SIEM] Agg refactoring suggestions

Hiya SIEM team.  Over at Elasticsearch we've been looking into a few performance related items, and some of the aggs that SIEM dashboard uses caught our eye.

#### Benchmarks?
Do we benchmark any of the dashboards?  The Elasticsearch team uses [Rally](https://github.com/elastic/rally) extensively, perhaps we could find a way to translate the dashboard requests into some kind of rally track?  It'd help both of us keep an eye on performance, make changes easier to think about, and easier to collaborate on since we'd have shared dataset to look at

#### Usage of `filter` aggs
There seems to be widespread use of `filter` aggs, which is non-ideal.  Filter aggs are relatively expensive, especially when compared to filtering in the `query` component of a search request.  Each individual filter agg needs to load the bitset of docs that contain that value, and check it against the doc one-by-one (as opposed to `query` filters which can use a leap-frog mechanism to minimize checks).

So the first thing would be trying to move `filter` aggs up into the query where possible, if they are being used to exclude documents.

If they are being used for counts ([like here](https://github.com/elastic/kibana/blob/master/x-pack/plugins/security_solution/server/lib/overview/query.dsl.ts#L35)), there are some options:

1. Try to rewrite some of those to operate as `terms` aggs.  E.g. if multiple filters share the same field (`event.module` or something), a terms agg will give you doc counts for all the different event modules.  Terms is pretty aggressively optimized because it is so widely used.  It's hard to say for sure if it would help, but from some informal testing (see rally test at end) it tends to be noticeably faster.
2. For fields that are non-overlapping and sparse, a `value_count` agg can be useful.  E.g. if only a subset of docs have a certain field and you want to know how many there are, a `value_count` on that field will return the count without having to bucket them.  A relatively niche usage here, but handy if applicable
3. Rewrite into an `msearch` and skip aggregating all together.  Each msearch clause will be a single search request filtering for specifically the criteria needed.  With `size: 0` you don't incur a fetch-overhead, and with `track_total_hits: true` you can still get the total count. 

    3b. If you don't need exact counts, setting `track_total_hits: false` will enable the new block max-wand optimization and return results _very_ fast.  You can configure a threshold when it stops counting, so you can say `"> 100,000 results"`, etc

I ran a simple test showing msearch (`"count"`), filter, filters, term and value_count.  As you can see, msearch is fastest by a large margin, followed by term and value_count.  Filter/filters are generally slower

![image](https://user-images.githubusercontent.com/1224228/84679639-889b6400-aeff-11ea-82eb-871d4bc05423.png)

#### `terms` instead of `filter` for partitioning

Related to 1) above, if there is a scenario where you wish to partition the same field into multiple buckets, a `terms` agg will be faster (and simpler query) than a series of `filter` aggs.  For example, [this request](https://github.com/elastic/kibana/blob/master/x-pack/plugins/security_solution/server/lib/kpi_hosts/query_authentication.dsl.ts#L48-L93) uses two `filter` aggs to create "success" and "failure" buckets.

Instead, a single `terms` agg on the field will produce both buckets and do it cheaper.  In addition, the child `filter: event.outcome: success` agg is unnecessary because by the nature of the parent bucket, all docs in that bucket are already success/failure.  You can just grab the count from the bucket doc_count.

If there are unrelated values in the field and you _only_ want "success"/"failure", you can use the `include`/`exclude` functionality of a `terms` agg to only include terms you care about.

#### AutoDateHistogram min_interval
There's some optimization work done in ES (coming 7.8/7.9) which will improve auto-date-histo speed noticeably.  But in the mean time, specifying a `min_interval` will help prevent extra work.  E.g. auto-date-histo will start with second-level intervals and round up from there.  If querying a 12h time range it almost never makes sense to look at second-intervals, so that part of the rounding is wasted effort.

This does remove some of the convenience of "fire and forget" aspect of auto-date-histo, but it can translate into notable performance improvements.  I'm not sure the best option here, but if there's a way to intelligently set `min_interval` it'd probably help.

#### Closing

Sorry for the long ticket!  I decided to file this as a ticket instead of email/slack/google doc/etc because it seemed easier to work through on github.  Feel free to ping me if you have questions, happy to help out!  It's hard to say for sure if any of these suggestions will _actually_ help (although the msearch case is very compelling due to how it works), which is why I led with the question about benchmarks.  Setting those up might be a good first step so we can quantitatively tweak the queries/aggs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIEM] Agg refactoring suggestions #69172

Benchmarks?

Usage of `filter` aggs

`terms` instead of `filter` for partitioning

AutoDateHistogram min_interval

Closing

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[SIEM] Agg refactoring suggestions #69172

Description

Benchmarks?

Usage of filter aggs

terms instead of filter for partitioning

AutoDateHistogram min_interval

Closing

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Usage of `filter` aggs

`terms` instead of `filter` for partitioning