Aggregate metric double use average#142135
Conversation
|
Hi @gmarouli, I've created a changelog YAML for you. |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Kind reminder to @felixbarny & @romseygeek can you take a look at this PR please? |
carsonip
left a comment
There was a problem hiding this comment.
did not look at the implementation (can someone else review the code please?), but approving based on the previous approval that it should not break apm based on apm's usage.
|
|
||
| private final EnumMap<Metric, NumberFieldMapper.NumberFieldType> metricFields; | ||
| // If there is only one metric configured for this field, we delegate to the field. | ||
| private final Metric singleMetric; |
There was a problem hiding this comment.
I wonder if this would be simpler to handle at the top level, ie if there's only a single metric configured then we just return that subfield as the top-level MappedFieldType? Or do we definitely always need a full AggregateMetricDoubleFieldType for the Mapper to work properly?
There was a problem hiding this comment.
That would be cool, it would reduce the conditionals that are scattered around. However, I am concerned about how we would preserve the semantics of the individual metric.
For example, when downsampling, if we see a NumberFieldType we will treat it differently than if we see an AggregateMetricDoubleFieldType even if it has a single metric. I think the same might be happening with aggregations. We still have that name of the field that could help my-aggregate.min but this is ambiguous since it could be either a sub-object or an aggregate metric double.
Am I missing anything?
There was a problem hiding this comment.
Am I missing anything?
More likely that I am, as I'm not that familiar with how downsampling interacts here! Let's keep things as they are for now and maybe investigate simplifying it in a follow-up.
There was a problem hiding this comment.
I will leave here some notes for reference:
Downsampling
The field value fetchers identify that a field, for example, my-metric is an aggregate_metric_double via its type:
if (fieldType instanceof AggregateMetricDoubleFieldMapper.AggregateMetricDoubleFieldType aggMetricFieldType) {
fetchers.addAll(AggregateMetricDoubleFieldValueFetcher.create(context, aggMetricFieldType, samplingMethod));
} else {
Then we create a dedicated downsampler for each metric and downsamples it accordingly. For example, for my-metric.min it will keep the min of the values, for my-metric.sum it will sum all the values and so on. This is how it works even if my-metric has only one metric.
Aggregations
Furthermore, an aggregate metric double only supports the min, max, avg, sum and count aggregations by using the equivalent metric. An aggregate metric double with only min and max, won't be able to calculate the sum for example. I think this behaviour will also change if we switch to using a NumberFieldType.
Conclusion
I am not saying we shouldn't do it, because maybe the above examples are not relevant in real word cases of a single metric. But as you said it's something that we should investigate on its own.
In this PR we replace the "derived metric" introduced in elastic#141877 with a script average or if the `aggregate_metric_double` has only one metric, with that metric. We support the following scenarios: - Single metric: we introduce the concept of a single metric, this is set only if the aggregate_metric_double has only one value. We saw this being used in tests when we aggregate using the min or max, but we did provide full support just in case it's used. - Average is available: when aggregate_metric_double has the sum and value_count then we calculate the average. This is used in scripts, sorting and filtering queries. Finally, we return no results in filtering, if none of the above apply. Part of elastic#128356
In this PR we replace the "derived metric" introduced in #141877 with a script average or if the
aggregate_metric_doublehas only one metric, with that metric.We support the following scenarios:
aggregate_metric_doublehas only one value. We saw this being used in tests when we aggregate using the min or max, but we did provide full support just in case it's used.aggregate_metric_doublehas thesumandvalue_countthen we calculate the average. This is used in scripts, sorting and filtering queries.Part of #128356