Skip to content

promql: Set a specific CounterResetHint on histogramRate() returned histogram#17608

Merged
beorn7 merged 2 commits intoprometheus:mainfrom
tcp13equals2:return_explicit_counter_reset_hint_histogram_rate
Nov 26, 2025
Merged

promql: Set a specific CounterResetHint on histogramRate() returned histogram#17608
beorn7 merged 2 commits intoprometheus:mainfrom
tcp13equals2:return_explicit_counter_reset_hint_histogram_rate

Conversation

@tcp13equals2
Copy link
Contributor

@tcp13equals2 tcp13equals2 commented Nov 25, 2025

Which issue(s) does the PR fix:

This PR addresses an issue that occurs when Mimir shards a query and forwards the shards to the Prometheus PromQL engine.

We discovered that a query shaped like:

histogram_count(sum(rate(<histogram_metric>)))

can incorrectly emit the warning:

PromQL warning: conflicting counter resets during histogram aggregation

when evaluated via the Mimir query-frontend with query sharding enabled.

Background

In a normal, non-sharded Prometheus evaluation:

  • The presence of histogram_count() causes the PromQL engine to set SkipHistogramBuckets=true.
  • This enables the HistogramStatsIterator, which loads raw native-histogram samples.
  • These samples are passed through rate(), which produces histogram-shaped deltas.
  • The results then flow into sum().
  • During the sum() aggregation, the engine inspects the per-histogram CounterResetHint values. If the histograms being aggregated disagree about reset semantics, Prometheus emits the conflicting counter reset warning.

This behaviour is correct for unsharded evaluation.

The Problem Under Sharding

When the Mimir query-frontend shards the query, the transformed query becomes something like:

histogram_count(sum by (foo) (__embedded_queries__{__queries__="…"}))

Each embedded query looks like:

sum by (foo) (rate(metric{__query_shard__="1_of_8"}[1m1s]))

Each shard is evaluated separately by Prometheus.

Now, at the outer level:

  • The query still contains histogram_count(...),
  • So Prometheus again enables SkipHistogramBuckets=true,
  • And again uses the HistogramStatsIterator,
  • But now it is inspecting the shard-aggregated histograms, not the raw samples.
  • This causes the PromQL engine to re-interpret the shard-level aggregated histograms as if they were native histograms with counter semantics.
  • Since the shard boundaries distort counter reset detection, Prometheus may infer that the histograms have conflicting resets—even though the underlying query is valid.
  • Thus, the warning is incorrectly emitted.

Ways to Avoid the Incorrect Warning

There are two general approaches (aside from completely avoiding sharding of these queries):

  1. Disable SkipHistogramBuckets in the outer query

By preventing Prometheus from using the HistogramStatsIterator on the already-aggregated shard results, the incorrect reset inference does not occur.

  1. Assign an explicit CounterResetHint in the histogram produced by rate()

This matches the implementation of Mimir’s MQE engine.

In particular, if the histogram returned by rate() is marked:

CounterResetHint = histogram.GaugeType

then downstream operators treat it as a “gauge-type histogram”—one that does not participate in counter-reset logic.

With this hint, the sum aggregation no longer triggers the conflicting reset warning.

This seems semantically reasonable because rate() produces a new histogram representing per-second deltas, not a cumulative counter.

Does this PR introduce a user-facing change?

[BUGFIX] promql: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram.

Signed-off-by: Andrew Hall <andrew.hall@grafana.com>
Signed-off-by: Andrew Hall <andrew.hall@grafana.com>
@beorn7
Copy link
Member

beorn7 commented Nov 25, 2025

Thanks for the comprehensive explanation.

However, I believe even in vanilla Prometheus, the result of rate/delta/increase should always be marked as a gauge histogram. I'm pretty sure this was the case in the past, but I guess when we refined the counter-reset handling in addition and subtraction, we created the current behavior as a regression.

I think the code change is fine, reestablishing the desired behavior. @krajorama maybe you want to confirm.

I'm currently thinking if we can do the test within the testing framework.

@beorn7
Copy link
Member

beorn7 commented Nov 25, 2025

The testing framework allows to set a counter rest hint in the expectations, but it doesn't verify it. :(

Then the explicit test in this PR is fine. I will file an issue for the testing framework.

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but waiting until tomorrow in case @krajorama wants to chime in.

Note that I updated the PR description because I think this is a bugfix for Prometheus users.

@beorn7
Copy link
Member

beorn7 commented Nov 25, 2025

#17615 for the promqltest issue.

@tcp13equals2
Copy link
Contributor Author

Thank you @beorn7 !

@beorn7 beorn7 merged commit 7bb95d5 into prometheus:main Nov 26, 2025
30 checks passed
tcp13equals2 added a commit to grafana/mimir that referenced this pull request Dec 5, 2025
#### What this PR does

Do not merge until [upstream
vendoring](prometheus/prometheus#17608) is
vendored in.

This PR aligns the histogram counter reset conflict detection logic to
match the prometheus implementation for avg and sum aggregations. It
also fixes an issue where the histogram counter reset hint was not set
to Guage on SUB operations.

This PR also adds a unit test to `querysharding_test.go`
`TestConflictingCounterResets` which validates a fix for an issue with
query-frontends sharding queries to prometheus promql engine.

#### Which issue(s) this PR fixes or relates to

Fixes #<issue number>

#### Checklist

- [ x] Tests updated.
- [ ] Documentation added.
- [ x] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [ ]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> <sup>[Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) is
generating a summary for commit
6114d0f. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Jan 10, 2026
##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0)

#### Note for users of Native Histograms

In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect.  You must now turn on
the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters.

#### Changelog

- \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528)
- \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647)
- \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427)
- \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636)
- \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409)
- \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435)
- \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714)
- \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728)
- \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565)
- \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834)
- \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374)
- \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637)
- \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483)
- \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138)
- \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477)
- \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561)
- \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439)
- \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530)
- \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519).
- \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682)
- \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602)
- \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605)
- \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678)
- \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643)
- \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479)
- \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668)
- \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708)
- \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703)
- \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161)
- \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608)
- \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559)
- \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533)
- \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561)
- \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546)
- \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Jan 10, 2026
##### [\`v3.9.1\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.1)

- \[BUGFIX] Agent: fix crash shortly after startup from invalid type of object. [#17802](prometheus/prometheus#17802)
- \[BUGFIX] Scraping: fix relabel keep/drop not working. [#17807](prometheus/prometheus#17807)

---
##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0)

#### Note for users of Native Histograms

In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect.  You must now turn on
the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters.

#### Changelog

- \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528)
- \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647)
- \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427)
- \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636)
- \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409)
- \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435)
- \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714)
- \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728)
- \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565)
- \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834)
- \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374)
- \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637)
- \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483)
- \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138)
- \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477)
- \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561)
- \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439)
- \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530)
- \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519).
- \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682)
- \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602)
- \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605)
- \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678)
- \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643)
- \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479)
- \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668)
- \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708)
- \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703)
- \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161)
- \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608)
- \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559)
- \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533)
- \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561)
- \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546)
- \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Jan 10, 2026
##### [\`v3.9.1\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.1)

- \[BUGFIX] Agent: fix crash shortly after startup from invalid type of object. [#17802](prometheus/prometheus#17802)
- \[BUGFIX] Scraping: fix relabel keep/drop not working. [#17807](prometheus/prometheus#17807)

---
##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0)

#### Note for users of Native Histograms

In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect.  You must now turn on
the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters.

#### Changelog

- \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528)
- \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647)
- \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427)
- \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636)
- \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409)
- \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435)
- \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714)
- \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728)
- \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565)
- \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834)
- \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374)
- \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637)
- \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483)
- \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138)
- \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477)
- \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561)
- \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439)
- \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530)
- \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519).
- \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682)
- \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602)
- \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605)
- \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678)
- \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643)
- \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479)
- \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668)
- \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708)
- \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703)
- \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161)
- \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608)
- \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559)
- \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533)
- \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561)
- \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546)
- \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants