PromQL: Fix bug with inconsistent results for queries with OR expression and EnableDelayedNameRemoval by zenador · Pull Request #17161 · prometheus/prometheus

zenador · 2025-09-08T17:10:34Z

There's a bug with inconsistent results with OR expressions when the experimental feature EnableDelayedNameRemoval is on.

With the added flaky tests in this PR, if you run the test once, it generally passes, but if you run it repeatedly, it sometimes fails.

Try it with go test -v ./promql -run "TestEvaluations/testdata/name_label_dropping_2/line_" --count=100

The order of series processed is not deterministic, and depending on the order, dropName may or may not be applied, so with the same query and data, sometimes the metric name is dropped and sometimes it's not.

This is because we use a map to produce the results in the output matrix for the binary operation, and native maps in golang do not enforce order when ranging over them, and this matrix is passed to the sum aggregation where the order of input determines the dropName applied for the grouping.

This PR does introduce some small overhead in an extra slice to store the hashes in the original order. Not sure if this is small enough to be worth it to fix the inconsistent results or if we should find a better solution.

Does this PR introduce a user-facing change?

[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression and the experimental EnableDelayedNameRemoval. #17161

promql/promqltest/testdata/name_label_dropping_2.test

beorn7 · 2025-10-02T17:23:59Z

Thanks for catching this.

Formal nit: Let's not create name_label_dropping_2.test but just append to the existing name_label_dropping.test.

About the real meat here: This is an interesting edge case you have discovered here (although mostly an academic one, as you have to aggregate by __name__ (not very common) and have to construct a caso where a name marked for dropping is equal to a name not marked for dropping (which I would expect to happen only in contrived examples or by accident rather than in a query that actually serves any purpose)). Nevertheless, we need to handle this case in a defined and reproducible way.

The approach here is reproducible, but I would say it is defined in a problematic way: IIUC, the decision whether the name is ultimately dropped or not is made by the series that happens to land on top of the list. Which is hard to predict by the user, if at all. It also doesn't seem to make a lot of sense (although there is maybe not much need to make a lot of sense for this rare edge case). Still, this worries me much more than the performance concerns you have already brought up.

The core of the problem here is that we aggregate (by name) a number of series with the same name where some of the names are marked as to be dropped and others are not. The fundamental question is what should we do with the name, drop or not? Which gives us two obvious ways, and then I also came up with a trickier way out:

In that case, we always drop the name.
In that case, we always keep the name.
We somehow consider a name marked for dropping to have a different identity than a name not marked for dropping, so we aggregate the series into two different groups after all, and since one of the groups gets its name dropped in the end, we don't even produce an ambiguity.

I believe (1) and (2) are easy to implement and fairly clear. I cannot honestly say which one I like better. It's hard to argue for one over the other, given that there is no real-world use case to judge them.

I like (3) most because it makes sense semantically and is close to the behavior we have without the delayed name dropping. I'm just concerned that it will have deeper implications and might not be trivial to implement.

What do others think? @jcreixell @juliusv @roidelapluie might be the people with the most context, but I'd be happy about anyone chiming in with insights.

zenador · 2025-10-06T12:01:23Z

Formal nit: Let's not create name_label_dropping_2.test but just append to the existing name_label_dropping.test.

Updated.

the decision whether the name is ultimately dropped or not is made by the series that happens to land on top of the list. Which is hard to predict by the user, if at all.

True, added some test cases to demonstrate this further. If the LHS of the OR expression has no series, we will use RHS to determine the name dropping, but otherwise we will use LHS. But this depends on the underlying data and not the input expression, and it seems strange to have the labels change unpredictably based on that.

Regarding your suggestions, personally I'd vote against 3 as it is added complexity for something which is currently just academic (unless someone thinks of a real-world use case where implementing 3 would actually be preferable).

I like either 1 or 2, no preference as long as we apply it consistently. Just to be clear, when we have multiple series, we'd continue to drop the name if all series drop the name, and continue to not drop the name if all series not drop the name. But when not all the series behave the same way:

If at least 1 series drops the name, we drop the name for all series.
If at least 1 series keeps the name, we keep the name for all series

But actually, we'd run into the same problem above, where the resulting labels depend on the underlying data and is not predicable from the input expression alone, as for instance in sum by (__name__) (metric_total{env="3"} or rate(metric_total{env="2"}[5m])), we wouldn't know without running the query whether we'd get series from the LHS or RHS or both or neither. So it looks like we can't solve this problem without the additional complexity of 3, but at least choosing 1 or 2 over the current approach in this PR saves us from having to use extra memory and makes it independent of the order.

beorn7 · 2025-10-15T15:27:18Z

I broadly agree with your conclusion that we should pick the least complicated solution, as there will be hardly any real-world use cases that would depend on a particular behavior being picked over another.

I'm still not so sure how problematic approach (3) would be in this regard, but let's assume it would be the most complicated ones, with possibly some surprising implications further down the road.

If the choice is between (1) and (2), I'd currently prefer (1), i.e. "If at least 1 series drops the name, we drop the name for all series.", following the rationale that with at least one thing in the aggregation that doesn't want to be called by its old name anymore, the aggregate of everything probably doesn't like that name either. (Or: If I add one apple to a lot of oranges, I cannot claim anymore that the aggregate is still oranges.)

So let's go for (1).

beorn7 · 2025-10-15T15:29:25Z

But actually, we'd run into the same problem above, where the resulting labels depend on the underlying data and is not predicable from the input expression alone, as for instance in sum by (__name__) (metric_total{env="3"} or rate(metric_total{env="2"}[5m])), we wouldn't know without running the query whether we'd get series from the LHS or RHS or both or neither.

I don't quite understand this concern. Don't we decide about the name dropping dynamically anyway, i.e. while evaluating the expression, rather than ahead of the evaluation just based on syntax?

beorn7 · 2025-10-15T15:30:27Z

About properly reviewing this: I might only get to it after PromCon. But if I understand correctly, you would first implement approach (1) before it makes sense to do a detailed review anyway.

beorn7 · 2025-10-15T15:31:58Z

BTW: This behavior should be included in the feature flag documentation.

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

This reverts commit e7b238c. Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

beorn7

Thank you very much, and apologies for the delay in reviewing this.

I just have suggestions how to improve comments. Code looks great.

docs/feature_flags.md

promql/engine.go

Co-authored-by: Björn Rabenstein <github@rabenste.in> Signed-off-by: zenador <zenador@users.noreply.github.com>

zenador · 2025-11-14T13:24:20Z

Thank you and no worries about the delay! Applied your suggestions.

Realised I forgot to reply this:

I don't quite understand this concern. Don't we decide about the name dropping dynamically anyway, i.e. while evaluating the expression, rather than ahead of the evaluation just based on syntax?

Yes that's true for this edge case. But typically for normal cases we would be able to tell just by looking, e.g. metric_total{env="3"} does not drop name but rate(metric_total{env="2"}[5m]) does. I don't have a better suggestion though.

zenador · 2025-11-14T14:25:34Z

I believe the windows test failure is transient (or at least it should have nothing to do with the changes in this PR).

beorn7 · 2025-11-15T16:59:28Z

Thank you very much. Indeed, the MS Windows tests are often flaky. I'll re-run, but if it doesn't work, I'll just override.

beorn7 · 2025-11-15T16:59:28Z

Thank you very much. Indeed, the MS Windows tests are often flaky. I'll re-run, but if it doesn't work, I'll just override.

##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0) #### Note for users of Native Histograms In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect. You must now turn on the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters. #### Changelog - \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528) - \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647) - \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427) - \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636) - \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409) - \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435) - \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714) - \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728) - \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565) - \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834) - \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374) - \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637) - \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483) - \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138) - \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477) - \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561) - \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439) - \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530) - \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519). - \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682) - \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602) - \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605) - \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678) - \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643) - \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479) - \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668) - \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708) - \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703) - \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161) - \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608) - \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559) - \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533) - \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561) - \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546) - \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)

##### [\`v3.9.1\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.1) - \[BUGFIX] Agent: fix crash shortly after startup from invalid type of object. [#17802](prometheus/prometheus#17802) - \[BUGFIX] Scraping: fix relabel keep/drop not working. [#17807](prometheus/prometheus#17807) --- ##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0) #### Note for users of Native Histograms In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect. You must now turn on the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters. #### Changelog - \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528) - \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647) - \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427) - \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636) - \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409) - \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435) - \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714) - \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728) - \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565) - \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834) - \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374) - \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637) - \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483) - \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138) - \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477) - \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561) - \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439) - \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530) - \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519). - \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682) - \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602) - \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605) - \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678) - \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643) - \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479) - \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668) - \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708) - \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703) - \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161) - \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608) - \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559) - \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533) - \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561) - \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546) - \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)

zenador mentioned this pull request Sep 8, 2025

MQE: implement delayed name removal grafana/mimir#12509

Merged

4 tasks

charleskorn reviewed Sep 9, 2025

View reviewed changes

promql/promqltest/testdata/name_label_dropping_2.test Outdated Show resolved Hide resolved

zenador force-pushed the flaky-test-delayed-name-removal branch from 75063ad to e7b238c Compare September 9, 2025 08:38

zenador changed the title ~~Bug: inconsistent results for PromQL query with EnableDelayedNameRemoval~~ PromQL: Fix bug with inconsistent results for queries with OR expression and EnableDelayedNameRemoval Sep 9, 2025

zenador marked this pull request as ready for review September 9, 2025 09:17

zenador requested a review from roidelapluie as a code owner September 9, 2025 09:17

roidelapluie reviewed Sep 9, 2025

View reviewed changes

promql/promqltest/testdata/name_label_dropping_2.test Outdated Show resolved Hide resolved

beorn7 mentioned this pull request Oct 6, 2025

promql: Make promql-delayed-name-removal the default #15855

Open

zenador added 9 commits October 24, 2025 07:58

Add flaky tests for demonstration

77587ea

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Expand unit tests

0b3fada

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Make them not flaky anymore

b8a305a

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Remove wrong license info

316593d

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Move new tests into existing file

7e26db9

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Add more test cases

d8c90e7

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Update unit tests for new desired behaviour

58960f1

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Revert "Make them not flaky anymore"

2a46048

This reverts commit e7b238c. Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Update logic for new desired behaviour

908c4ce

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

zenador force-pushed the flaky-test-delayed-name-removal branch from d99252a to 908c4ce Compare October 23, 2025 23:59

beorn7 reviewed Nov 13, 2025

View reviewed changes

docs/feature_flags.md Outdated Show resolved Hide resolved

promql/engine.go Outdated Show resolved Hide resolved

Apply suggestions from code review

873bff4

Co-authored-by: Björn Rabenstein <github@rabenste.in> Signed-off-by: zenador <zenador@users.noreply.github.com>

beorn7 approved these changes Nov 15, 2025

View reviewed changes

beorn7 merged commit c64dd61 into prometheus:main Nov 15, 2025
45 of 46 checks passed

Conversation

zenador commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Does this PR introduce a user-facing change?

Uh oh!

Uh oh!

Uh oh!

beorn7 commented Oct 2, 2025

Uh oh!

zenador commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beorn7 commented Oct 15, 2025

Uh oh!

beorn7 commented Oct 15, 2025

Uh oh!

beorn7 commented Oct 15, 2025

Uh oh!

beorn7 commented Oct 15, 2025

Uh oh!

beorn7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zenador commented Nov 14, 2025

Uh oh!

zenador commented Nov 14, 2025

Uh oh!

beorn7 commented Nov 15, 2025

Uh oh!

beorn7 commented Nov 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zenador commented Sep 8, 2025 •

edited

Loading

zenador commented Oct 6, 2025 •

edited

Loading