Skip to content

Downsample exponential histograms#138561

Merged
gmarouli merged 18 commits intoelastic:mainfrom
gmarouli:downsample-histograms
Dec 3, 2025
Merged

Downsample exponential histograms#138561
gmarouli merged 18 commits intoelastic:mainfrom
gmarouli:downsample-histograms

Conversation

@gmarouli
Copy link
Copy Markdown
Contributor

This PR implements downsampling for exponential histograms both as labels and as metrics. Marked as a non-issue because exponential histograms are behind a feature flag.

Closes #136259

@gmarouli gmarouli added >non-issue :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data labels Nov 25, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Copy Markdown
Contributor

@JonasKunz JonasKunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

* Creates a new instance with the OpenTelemetry SDK default bucket limit of {@link ExponentialHistogramMerger#MAX_HISTOGRAM_BUCKETS}
* @param circuitBreaker the circuit breaker to use to limit memory allocations
*/
public static ExponentialHistogramMerger create(ExponentialHistogramCircuitBreaker circuitBreaker) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea to encode this default here!
I'll take care in a follow up to also use this in the ES|QL code.

case GAUGE -> NumericMetricFieldProducer.createFieldProducerForGauge(name(), samplingMethod);
case COUNTER -> LastValueFieldProducer.createForMetric(name());
case HISTOGRAM -> throw new IllegalArgumentException("Unsupported metric type [histogram] for downsampling, coming soon");
case HISTOGRAM -> ExponentialHistogramMetricFieldProducer.createMetricProducerForExponentialHistogram(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we be differentiating between exp histogram, tdigest and legacy histogram?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we be differentiating between exp histogram, tdigest and legacy histogram?

Before I answer this, I would like to double check if I understand correctly. We have two field types for histograms:

  • exponential_histogram which can only be an exponential histogram, and
  • histogram which supports tdigest and hdr,

Is that correct? If yes, we can continue:

We can distinguish an exponential histogram from a legacy histogram from the field type. So, a follow up PR would take the field type into account exponential_histogram vs histogram.

In the first phase of legacy histograms, we will only support tdigest as a metric. So, we have the following situation:

  • time_series_metric: histogram & type: exponential_histogram means merging to an exponential histogram.
  • time_series_metric: histogram & type: histogram means merging to a tdigest legacy histogram.
  • if the histogram is not a metric then it's downsampled with last value which does not need to know the algorithm.

If in the future we decide to support the other merging algorithm of legacy histograms we will need a way to distinguish between the two legacy histograms. It could be an index setting or a mapping setting, we would need to think what is a better way.

Does this answer your question?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, so we'll combine this with type information. Can we add the check in the current version, to restrict this only when time_series_metric: histogram & type: exponential_histogram and use last value otherwise?

Fwiw your understanding is correct - a slight correction is that histogram is not tdigest or hdr but whatever the user provided. We can just process it with these algorithms.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, so we'll combine this with type information. Can we add the check in the current version, to restrict this only when time_series_metric: histogram & type: exponential_histogram and use last value otherwise?

A time_series_metric: histogram only supports exponential histograms. If a user tries to set up the following combination time_series_metric: histogram and type: histogram in their mapping they will get an error. I can add time_series_metric: histogram & type: exponential_histogram to showcase how it's going to be, but the second condition will always be true.

Fwiw your understanding is correct - a slight correction is that histogram is not tdigest or hdr but whatever the user provided. We can just process it with these algorithms.

Clear! Thanks.

@gmarouli gmarouli added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 3, 2025
@gmarouli
Copy link
Copy Markdown
Contributor Author

gmarouli commented Dec 3, 2025

CI passed but it looks like github did not pick it up. I will update with main just to be sure.

@gmarouli gmarouli merged commit bec10d7 into elastic:main Dec 3, 2025
33 of 34 checks passed
@gmarouli gmarouli deleted the downsample-histograms branch December 3, 2025 12:43
elasticsearchmachine pushed a commit that referenced this pull request Dec 5, 2025
#139033)

In #138561 we added
exponential histograms and currently we are working on adding legacy
histograms too. Every time we add a new doc value reader we need to add
new types of field producer and doc value reader pairs. This PR makes
that easier by making `AbstractDownsampleFieldProducer` generic.

This is extracted from the work to downsample legacy histograms to
facilitate reviewing.
mamazzol pushed a commit to mamazzol/elasticsearch that referenced this pull request Dec 5, 2025
elastic#139033)

In elastic#138561 we added
exponential histograms and currently we are working on adding legacy
histograms too. Every time we add a new doc value reader we need to add
new types of field producer and doc value reader pairs. This PR makes
that easier by making `AbstractDownsampleFieldProducer` generic.

This is extracted from the work to downsample legacy histograms to
facilitate reviewing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Downsampling++] Downsample exponential histograms as a label and metric.

4 participants