Skip to content

[ES|QL] Use avg metric for AMD default metric#141331

Merged
limotova merged 20 commits intoelastic:mainfrom
limotova:use-avg-metric-amd
Feb 28, 2026
Merged

[ES|QL] Use avg metric for AMD default metric#141331
limotova merged 20 commits intoelastic:mainfrom
limotova:use-avg-metric-amd

Conversation

@limotova
Copy link
Copy Markdown
Contributor

@limotova limotova commented Jan 27, 2026

This commit changes the behavior of the default metric for
aggregate_metric_double in ES|QL to always use the average regardless
of what the default metric is mapped to.

Note that this avg is not the same as the average you would get if you
ran STATS avg(agg_metric), because the aforementioned is the average
of all docs, but the avg metric is an average on a per-doc basis.

Part of #128356

This commit changes the behavior of the default metric for
aggregate_metric_double in ES|QL to always use the average regardless
of what the default metric is mapped to.

Note that this avg is not the same as the average you would get if you
ran `STATS avg(agg_metric)`, because the aformentioned is the average
of all docs, but the avg metric is an average on a per-doc basis.
@limotova limotova requested a review from gmarouli January 28, 2026 08:52
Copy link
Copy Markdown
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

@limotova limotova marked this pull request as ready for review February 4, 2026 21:02
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Feb 4, 2026
@limotova limotova added the :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL label Feb 4, 2026
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Feb 4, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @limotova, I've created a changelog YAML for you.

Copy link
Copy Markdown
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very promising, I added some questions that will help me better understand the code before I give the LGTM :)

return values;
}
var sortedValues = leafReader.getSortedNumericDocValues(fieldName);
return DocValues.unwrapSingleton(sortedValues);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with the different doc values data structure, so it's not clear to me what we do here.

It looks like:

  1. We try to retrieve leafReader.getNumericDocValues(fieldName) first, why is that? Which cases do we have where this is null?
  2. Then if it's null, we move to leafReader.getSortedNumericDocValues(fieldName), I was under the impression that we used only this before.
  3. Then we DocValues.unwrapSingleton(sortedValues), this method returns null if the doc values are not singleton, are we sure that this will always be a singleton?

Can you elaborate a little bit, if it's just me then I will learn something new, if we think it's very specific for this case we can add some comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have any answers here - I don't know the specifics of this myself, iirc i think it usually goes through the null route (SortedNumericDocValues), but i believe this was how it was written in the very very first iteration and I didn't change it
I think @martijnvg might have a better understanding..?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the original code, we used to use DocValues.getSortedNumeric(context.reader(), subfieldName(getFieldName(), metric)); (see reference)

I am wondering if we can use this directly in the reader, see comment.

Comment on lines +161 to +162
NumericDocValues sumValues = getNumericDocValues(sumFieldType, context.reader());
NumericDocValues valueCountValues = getNumericDocValues(countFieldType, context.reader());
Copy link
Copy Markdown
Contributor

@gmarouli gmarouli Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context in comment, I was thinking something like that:

Suggested change
NumericDocValues sumValues = getNumericDocValues(sumFieldType, context.reader());
NumericDocValues valueCountValues = getNumericDocValues(countFieldType, context.reader());
NumericDocValues sumValues = DocValues.getSortedNumeric(context.reader(), sumFieldType.name());
NumericDocValues valueCountValues = DocValues.getSortedNumeric(context.reader(), countFieldType.name());

@gmarouli
Copy link
Copy Markdown
Contributor

@limotova #141877 & #142135 are merged now, can you bring this up to date and see if we can finish this up too?

Copy link
Copy Markdown
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (assuming ci did not detect anything relevant), I only had a nit to suggest.

@limotova limotova merged commit e9e4844 into elastic:main Feb 28, 2026
35 checks passed
@limotova limotova deleted the use-avg-metric-amd branch February 28, 2026 02:51
tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
This commit changes the behavior of the default metric for
aggregate_metric_double in ES|QL to always use the average regardless
of what the default metric is mapped to.

Note that this avg is not the same as the average you would get if you
ran `STATS avg(agg_metric)`, because the aforementioned is the average
of all docs, but the avg metric is an average on a per-doc basis.

Part of elastic#128356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL Team:StorageEngine v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants