Skip to content

promql: fix histogram_fraction issue when lower falls within the first bucket#17424

Merged
beorn7 merged 3 commits intoprometheus:release-3.8from
MohammadAlavi1986:fix-histogram-fraction
Nov 13, 2025
Merged

promql: fix histogram_fraction issue when lower falls within the first bucket#17424
beorn7 merged 3 commits intoprometheus:release-3.8from
MohammadAlavi1986:fix-histogram-fraction

Conversation

@MohammadAlavi1986
Copy link
Contributor

@MohammadAlavi1986 MohammadAlavi1986 commented Oct 29, 2025

Fixes histogram_fraction returning NaN when lower falls within the first bucket for classic histograms and NHCB. The issue is described here.

Which issue(s) does the PR fix:

Fixes #17367

Does this PR introduce a user-facing change?

[BUGFIX] Fix `histogram_fraction` for classic histograms and NHCB if lower bound is in the first bucket.

@beorn7
Copy link
Member

beorn7 commented Oct 29, 2025

@MichaHoffmann could you have a look?

@MichaHoffmann
Copy link
Contributor

I think the change makes sense, but can we convert the tests into acceptance tests in promqltest? That way we define the behavior for all prometheus compatible engines too.

@MohammadAlavi1986
Copy link
Contributor Author

I think the change makes sense, but can we convert the tests into acceptance tests in promqltest? That way we define the behavior for all prometheus compatible engines too.

I converted the unit tests into acceptance tests in promqltest.

Copy link
Contributor

@MichaHoffmann MichaHoffmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good catch that this is an issue for NHCB too

@MichaHoffmann
Copy link
Contributor

Since I cannot merge and dont know what should happen next; cc @beorn7 - if that ping was not needed im sorry 🙈

@beorn7
Copy link
Member

beorn7 commented Nov 3, 2025

Thanks, both of you. I'll give it a final look ASAP.

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, apologies for the review delay. I was incredibly busy with other things last week.

Thank you very much, in particular for the nice explanations and the comprehensive testing.

I have only two things:

  1. My usual capitalization and punctuation nits.
  2. More importantly, I think we should get this into the upcoming v3.8 release. An RC is already out, and the release-3.8 branch already exists. Thus, could you rebase this PR on top of the current state of the release-3.8 branch?

Thanks again.


// If the upper bound of the first bucket is greater than 0, we assume
// we are dealing with positive buckets only and lowerBound for the
// first bucket is set to 0; otherwise it is set to -Inf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: End sentence with a period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed punctuation in all comments.

{start="positive"} 0.6363636363636364
{start="negative"} 0

# positive buckets, lower falls in the first bucket
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We generally try to use normal capitalization and punctuation in comments for easier readability, so this should be

Suggested change
# positive buckets, lower falls in the first bucket
# Positive buckets, lower falls in the first bucket.

Could you go through the other comments as well and adjust?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed punctuation in all comments.

@beorn7
Copy link
Member

beorn7 commented Nov 12, 2025

/cc @jan--f as the release shepherd of v3.8.

@jan--f
Copy link
Contributor

jan--f commented Nov 12, 2025

I branched release-3.8 already. So if this is meant to be included it needs to target the release branch.

@beorn7
Copy link
Member

beorn7 commented Nov 12, 2025

My words exactly. 😁

Signed-off-by: Mohammad Alavi <m.alavi1986@gmail.com>
Signed-off-by: Mohammad Alavi <m.alavi1986@gmail.com>
Signed-off-by: Mohammad Alavi <m.alavi1986@gmail.com>
@MohammadAlavi1986 MohammadAlavi1986 changed the base branch from main to release-3.8 November 13, 2025 06:40
@MohammadAlavi1986
Copy link
Contributor Author

I fixed the punctuation in the comments and rebased my branch onto release-3.8. There’s a failing test that doesn’t seem related to my changes.

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much.

The test is indeed known-flaky (#17489).

@beorn7
Copy link
Member

beorn7 commented Nov 13, 2025

Note: A adjusted the release-note section in the PR description.

@beorn7 beorn7 merged commit 0f5f195 into prometheus:release-3.8 Nov 13, 2025
61 of 64 checks passed
charleskorn pushed a commit to charleskorn/prometheus that referenced this pull request Jan 15, 2026
* promql: fix histogram_fraction issue when lower falls within the first bucket (prometheus#17424)

Signed-off-by: Mohammad Alavi <m.alavi1986@gmail.com>

* prepare release 3.8.0-rc.0

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* test: skip TestRemoteWrite_ReshardingWithoutDeadlock temporarily as flaky (prometheus#17534) (prometheus#17543)

(cherry picked from commit 35c3232)

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>

* chore(deps): bump prometheus/promci from 0.4.7 to 0.5.0

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* chore(deps): bump prometheus/promci from 0.5.0 to 0.5.1

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* chore(deps): bump prometheus/promci from 0.5.1 to 0.5.2

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* chore(deps): bump prometheus/promci from 0.5.2 to 0.5.3

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* prw2: Move Remote Write 2.0 CT to be per Sample; Rename to ST (start timestamp) (prometheus#17411)

Relates to
prometheus#16944 (comment)

Signed-off-by: bwplotka <bwplotka@gmail.com>
(cherry picked from commit cefefc6)

* chore: prepare 3.8.0-rc.1 entry

Signed-off-by: bwplotka <bwplotka@gmail.com>

* [chore]: bump common dep to support RFC7523 3.1

Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* Update Prometheus Agent doc (prometheus#17591)

* Add a nav title to fix docs website generator.
* Make it more clear that "Prometheus Agent" is a mode, not a seaparate
  service.
* Add to index.
* Cleanup some wording.
* Add a downsides section.

Signed-off-by: SuperQ <superq@gmail.com>
(cherry picked from commit d0d2699)

* chore(deps): bump github.com/prometheus/common from 0.67.3 to 0.67.4 (prometheus#17594)

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* prepare release v3.8.0-rc.1

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* prepare release v3.8.0

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* chore: Fix function name typo in createBatchSpan comment

Signed-off-by: zjumathcode <pai314159@2980.com>

* feat: Add flag that blocks lvl 1 compactions until upload is confirmed in an external JSON file (prometheus#17435)

* Delay compactions until Thanos uploads all blocks

Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled.

The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* Rename structs and functions to make this more generic

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* Address review comments

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* Cache UploadMeta for 1 minute

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

---------

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* RW2: Allow custom scope in azuread (prometheus#17483)

Signed-off-by: Ben Edmunds <sammybenblue2@gmail.com>

* docs: Describe how time() is set to start at 0 in unit tests

The return value of functions relating to the current time, e.g. time(),
is set by promtool to start at timestamp 0 at the start of a test's
evaluation.

This has the very nice consequence that tests can run reliably without
depending on when they are run.

It does, however, mean that tests will give out results that can be
unexpected by users.

If this behaviour is documented, then users will be empowered to write
tests for their rules that use time-dependent functions.

(Closes: prometheus/docs#1464)

Signed-off-by: Gabriel Filion <lelutin@torproject.org>

* refactor(tsdb): use one test newTestDB constructor (prometheus#17638)

For tests only, we had various ways of opening DB. Reduced to one
instead of:

* Open
* newTestDB
* newTestDBOpts
* openTestDB

This so prometheus#17629 is smaller
and bit easier. Also for test maintainability and consistency.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Add start_timestamp field for unit tests.

This commit adds support for configuring a custom start timestamp
for Prometheus unit tests, allowing tests to use realistic timestamps
instead of starting at Unix epoch 0.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>

* Fix serialization for empty `ignoring()` in combination with `group_x()`

Currently both the backend and frontend printers/formatters/serializers
incorrectly transform the following expression:

```
up * ignoring() group_left(__name__) node_boot_time_seconds
```

...into:

```
up * node_boot_time_seconds
```

...which yields a different result (including the metric name in the result
vs. no metric name).

We need to keep empty `ignoring()` modifiers if there is a grouping modifier
present.

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Simplify StartTime assignment in unit test setup.

Remove redundant IsZero check since promqltest.LazyLoader already
handles zero StartTime by defaulting to Unix epoch.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>

* Update golangci-lint and add modernize check (prometheus#17640)

* add modernize check

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

* fix golangci lint

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

---------

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

* fix lint

---------

Signed-off-by: Mohammad Alavi <m.alavi1986@gmail.com>
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>
Signed-off-by: SuperQ <superq@gmail.com>
Signed-off-by: zjumathcode <pai314159@2980.com>
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Ben Edmunds <sammybenblue2@gmail.com>
Signed-off-by: Gabriel Filion <lelutin@torproject.org>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: dongjiang1989 <dongjiang1989@126.com>
Co-authored-by: Mohammad Alavi <m.alavi1986@gmail.com>
Co-authored-by: Jan Fajerski <jfajersk@redhat.com>
Co-authored-by: Jan Fajerski <jan--f@users.noreply.github.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Jorge Turrado <jorge.turrado@mail.schwarz>
Co-authored-by: Ben Kochie <superq@gmail.com>
Co-authored-by: zjumathcode <pai314159@2980.com>
Co-authored-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Co-authored-by: Ben Edmunds <Tigger2014@users.noreply.github.com>
Co-authored-by: Julien <291750+roidelapluie@users.noreply.github.com>
Co-authored-by: Gabriel Filion <lelutin@torproject.org>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
Co-authored-by: dongjiang <dongjiang1989@126.com>
Co-authored-by: Jeanette Tan <jeanette.tan@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

histogram_fraction returns NaN for classic histograms when lower falls within the first bucket

4 participants