[Logs UI] Handle log threshold alert grouping fields with large cardinalities more robustly

## Problem description

When the user creates a log threshold alert with a "group by" field of large cardinality, the alert executor will paginate through a large number of `composite` aggregation pages. This can use and possibly exceed the resources available in Elasticsearch and Kibana and thereby negatively impact the availability of the service. Additionally, the alert execution might time out and miss alerts that should have been fired.

On top of that the query used for checking the condition prioritizes correctness over performance by filtering out non-matching groups as late as possible. This allows for checking for zero-count threshold, but prevents Elasticsearch from optimizing the query more aggressively.

## Possible solutions

- Check and warn about high cardinality of the grouping field when creating the job.
- Offer a setting on job creation to set an acceptable cardinality limit (as in "group by host.name up to 10000 groups).
- Check the cardinality on execution and fail early and loudly when the configured limit is exceeded.
- Special case costly grouped "alert when less than" conditions and use more efficient queries for all other cases. (By moving the filter out of the `composite` agg to the global `bool` filter.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Logs UI] Handle log threshold alert grouping fields with large cardinalities more robustly #98010

Problem description

Possible solutions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Logs UI] Handle log threshold alert grouping fields with large cardinalities more robustly #98010

Description

Problem description

Possible solutions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions