Skip to content

feat: Add flag that blocks lvl 1 compactions until upload is confirmed in an external JSON file#17435

Merged
bwplotka merged 4 commits intoprometheus:mainfrom
prymitive:thanos-compactions
Dec 2, 2025
Merged

feat: Add flag that blocks lvl 1 compactions until upload is confirmed in an external JSON file#17435
bwplotka merged 4 commits intoprometheus:mainfrom
prymitive:thanos-compactions

Conversation

@prymitive
Copy link
Contributor

@prymitive prymitive commented Oct 31, 2025

Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled.

The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block.

We are running with this patch for over a month now and it reduced cpu usage on instances with Thanos uploads enabled dramatically, plus it increased the effective retention because we do>

On Thanos side this requires --shipper.ignore-unequal-block-size so Thanos stops complaining about the fact that compactions are enabled on Prometheus.
If this patch is accepted we could follow up with Thanos and to make it be aware of this Prometheus flag and if set stop complaining about compactions.

cc @bwplotka

Which issue(s) does the PR fix:

Does this PR introduce a user-facing change?

[FEATURE] Add --storage.tsdb.delay-compact-file.path flag for better interoperability with block upload side processes (e.g. Thanos sidecar). When this flag is enabled Prometheus will not compact level 1 blocks until block ULID is in JSON under this path.

@prymitive prymitive force-pushed the thanos-compactions branch 2 times, most recently from 79046a7 to c046572 Compare October 31, 2025 11:56
@gregwork
Copy link

This would be extremely useful for environments I manage.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

So if I understand correctly, it's a special mode where Prometheus compaction will depend on a special file using Thanos Shipper meta file format that is a simple JSON with a list of local blocks that were already uploaded to object storage:

type Meta struct {
	Version  int         `json:"version"`
        // Existing
	Uploaded []ulid.ULID `json:"uploaded"`
}

Commonly the file is generated by Thanos sidecar as thanos.shipper.json

With this change, Prometheus only starts level1+ compaction, only when blocks appear in this external JSON file, massively improving hybrid setups (using Thanos with a sidecar which still queries local Prometheus).

I assume the benefit is only for users with more than ~4-5h retention. Do we know how common it is to pay for Thanos and yet having longer local retention these days?

Should it be in Prometheus?

So the main argument why we never did this integration with Prometheus is that Thanos was new and it adds special provider handling for Prometheus code, literally only for Thanos. However it's super simple, makes sense and shipper file format never changed for the last 8 years. We could adopt and control this format in Prometheus too to fully make this feature our own. In this case I would literally avoid thanos words in flag and code to make sure any other integration can use this.

Is there an alternative for abstracting this logic to be useful for wider range of projects (aka API for compactions? or delaying Compactions?). I guess we could have a Prometheus mode that don't compact further unless a HTTP API /compact/uulid trigger happens. Would it be useful? It would mean some more code on Thanos side, but it's doable, and we add a functionality that is perhaps easier to use for others.

I do like the solution from this PR (modulo thanos wording in the flag and code) mostly because we have use cases in Prometheus to have a native block upload to objstore. cc @jesusvazquez @bboreham @SuperQ so maybe it's a good first step towards that support? For the native objstore support we would need to start a bigger discussion on DevSummit and proposals.

However, I am co-creator of Thanos project (and Prometheus maintainer) which makes me potentially biased. I would like to add this AND follow up eventually with the discussion for the native upload. But let's get some opinions from other maintainers (:

@xiu
Copy link

xiu commented Nov 23, 2025

I assume the benefit is only for users with more than ~4-5h retention. Do we know how common it is to pay for Thanos and yet having longer local retention these days?

At least in my deployments, we have alerting local to Prometheus and keep a 30+ days local retention to record SLOs.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thanks!

I reviewed in detail and I'd be keen to merge it, but we need another LGTM from someone else too. Will ping around once comments are addressed.

Some readability suggestions and perhaps one quick optimization one, otherwise LGTM from my side!

Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled.

The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great for me. I think this is important for the ecosystem and it's relevant for any block backup side-channel, not only Thanos.

LGTM, thanks!

I will wait with merging until we have a non-Thanos related maintainer approve.

Perhaps @roidelapluie @beorn7 @bboreham @ArthurSens @krajorama?

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed. :)


// Cache the last read UploadMeta.
var (
tsdbDelayCompactLastMeta *UploadMeta // The content of uploadMetaPath from the last time we've opened it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed, I wished it was not a global just to be future proof and a good practice.

Not a big deal as it's in main (not importable), let's merge anyway.

@bwplotka bwplotka changed the title Delay compactions until Thanos uploads all blocks feat: Add flag that blocks lvl 1 compactions until upload is confirmed in an external JSON file Dec 2, 2025
@bwplotka bwplotka merged commit 8a1086a into prometheus:main Dec 2, 2025
30 checks passed
@bwplotka
Copy link
Member

bwplotka commented Dec 2, 2025

I allowed myself to update PR title and Release notes, hope that makes sense.

prymitive added a commit to prymitive/thanos that referenced this pull request Dec 2, 2025
… flag

Prometheus has a new flag --storage.tsdb.delay-compact-file.path - prometheus/prometheus#17435.
When this flag is passed Prometheus will check which blocks are marked as uploaded in external file and only compact these.
Thanos should look for this flag and if it's set then it can stop forcing people to disable compactions.

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
@prymitive prymitive deleted the thanos-compactions branch December 2, 2025 11:33
@prymitive
Copy link
Contributor Author

Raised thanos-io/thanos#8582 for Thanos to support this flag

Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm late but this also LGTM. I was concerned at first with all the thanos mentioning inside Prometheus but the latest modifications have made it agnostic enough. Good work.

@xiu
Copy link

xiu commented Dec 2, 2025

Thanks a lot, all!

return nil, err
}
if c.blockExcludeFunc != nil && c.blockExcludeFunc(meta) {
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this continue instead of break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think it should.
Compactions work from oldest to newest, uploads do the same (usually).

If you continue here you'll skip compactions on this one block, but:

  • all further blocks are NOT yet uploaded
  • some or all further blocks are uploaded

If we continue and there are newer blocks to pick from then you will compact in a non-continuous way, leaving gaps of individual un-compacted blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see. this wasn't obvious. So this relies on the sorting that blockDirs and transitively os.ReadDir return, which is by filename, which is a block ULID, whose lexicographical sorting is also a sorting by timestamp

this requires a lot of gymnastics to understand. A comment or two on BlockExcludeFilterFunc and here in the loop would help

prymitive added a commit to prymitive/thanos that referenced this pull request Dec 9, 2025
… flag

Prometheus has a new flag --storage.tsdb.delay-compact-file.path - prometheus/prometheus#17435.
When this flag is passed Prometheus will check which blocks are marked as uploaded in external file and only compact these.
Thanos should look for this flag and if it's set then it can stop forcing people to disable compactions.

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
aknuds1 added a commit to aknuds1/prometheus that referenced this pull request Dec 24, 2025
The LeveledCompactor.Plan() function incorrectly used `break` instead
of `continue` when a block matched the BlockExcludeFilter. This caused
all blocks after the first excluded block to be silently ignored,
preventing them from being considered for compaction.

This bug affects the `--storage.tsdb.delay-compact-file.path` feature
used for Prometheus/Thanos coordination, introduced in PR prometheus#17435.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
aknuds1 added a commit to aknuds1/prometheus that referenced this pull request Dec 24, 2025
The LeveledCompactor.Plan() function incorrectly used `break` instead
of `continue` when a block matched the BlockExcludeFilter. This caused
all blocks after the first excluded block to be silently ignored,
preventing them from being considered for compaction.

This bug affects the `--storage.tsdb.delay-compact-file.path` feature
used for Prometheus/Thanos coordination, introduced in PR prometheus#17435.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Jan 10, 2026
##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0)

#### Note for users of Native Histograms

In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect.  You must now turn on
the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters.

#### Changelog

- \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528)
- \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647)
- \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427)
- \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636)
- \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409)
- \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435)
- \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714)
- \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728)
- \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565)
- \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834)
- \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374)
- \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637)
- \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483)
- \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138)
- \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477)
- \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561)
- \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439)
- \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530)
- \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519).
- \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682)
- \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602)
- \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605)
- \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678)
- \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643)
- \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479)
- \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668)
- \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708)
- \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703)
- \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161)
- \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608)
- \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559)
- \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533)
- \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561)
- \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546)
- \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Jan 10, 2026
##### [\`v3.9.1\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.1)

- \[BUGFIX] Agent: fix crash shortly after startup from invalid type of object. [#17802](prometheus/prometheus#17802)
- \[BUGFIX] Scraping: fix relabel keep/drop not working. [#17807](prometheus/prometheus#17807)

---
##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0)

#### Note for users of Native Histograms

In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect.  You must now turn on
the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters.

#### Changelog

- \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528)
- \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647)
- \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427)
- \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636)
- \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409)
- \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435)
- \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714)
- \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728)
- \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565)
- \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834)
- \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374)
- \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637)
- \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483)
- \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138)
- \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477)
- \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561)
- \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439)
- \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530)
- \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519).
- \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682)
- \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602)
- \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605)
- \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678)
- \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643)
- \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479)
- \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668)
- \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708)
- \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703)
- \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161)
- \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608)
- \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559)
- \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533)
- \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561)
- \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546)
- \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Jan 10, 2026
##### [\`v3.9.1\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.1)

- \[BUGFIX] Agent: fix crash shortly after startup from invalid type of object. [#17802](prometheus/prometheus#17802)
- \[BUGFIX] Scraping: fix relabel keep/drop not working. [#17807](prometheus/prometheus#17807)

---
##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0)

#### Note for users of Native Histograms

In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect.  You must now turn on
the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters.

#### Changelog

- \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528)
- \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647)
- \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427)
- \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636)
- \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409)
- \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435)
- \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714)
- \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728)
- \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565)
- \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834)
- \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374)
- \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637)
- \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483)
- \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138)
- \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477)
- \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561)
- \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439)
- \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530)
- \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519).
- \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682)
- \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602)
- \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605)
- \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678)
- \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643)
- \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479)
- \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668)
- \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708)
- \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703)
- \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161)
- \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608)
- \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559)
- \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533)
- \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561)
- \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546)
- \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)
coleenquadros pushed a commit to coleenquadros/thanos that referenced this pull request Jan 12, 2026
… flag

Prometheus has a new flag --storage.tsdb.delay-compact-file.path - prometheus/prometheus#17435.
When this flag is passed Prometheus will check which blocks are marked as uploaded in external file and only compact these.
Thanos should look for this flag and if it's set then it can stop forcing people to disable compactions.

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
charleskorn pushed a commit to charleskorn/prometheus that referenced this pull request Jan 15, 2026
* promql: fix histogram_fraction issue when lower falls within the first bucket (prometheus#17424)

Signed-off-by: Mohammad Alavi <m.alavi1986@gmail.com>

* prepare release 3.8.0-rc.0

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* test: skip TestRemoteWrite_ReshardingWithoutDeadlock temporarily as flaky (prometheus#17534) (prometheus#17543)

(cherry picked from commit 35c3232)

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>

* chore(deps): bump prometheus/promci from 0.4.7 to 0.5.0

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* chore(deps): bump prometheus/promci from 0.5.0 to 0.5.1

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* chore(deps): bump prometheus/promci from 0.5.1 to 0.5.2

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* chore(deps): bump prometheus/promci from 0.5.2 to 0.5.3

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* prw2: Move Remote Write 2.0 CT to be per Sample; Rename to ST (start timestamp) (prometheus#17411)

Relates to
prometheus#16944 (comment)

Signed-off-by: bwplotka <bwplotka@gmail.com>
(cherry picked from commit cefefc6)

* chore: prepare 3.8.0-rc.1 entry

Signed-off-by: bwplotka <bwplotka@gmail.com>

* [chore]: bump common dep to support RFC7523 3.1

Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* Update Prometheus Agent doc (prometheus#17591)

* Add a nav title to fix docs website generator.
* Make it more clear that "Prometheus Agent" is a mode, not a seaparate
  service.
* Add to index.
* Cleanup some wording.
* Add a downsides section.

Signed-off-by: SuperQ <superq@gmail.com>
(cherry picked from commit d0d2699)

* chore(deps): bump github.com/prometheus/common from 0.67.3 to 0.67.4 (prometheus#17594)

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* prepare release v3.8.0-rc.1

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* prepare release v3.8.0

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* chore: Fix function name typo in createBatchSpan comment

Signed-off-by: zjumathcode <pai314159@2980.com>

* feat: Add flag that blocks lvl 1 compactions until upload is confirmed in an external JSON file (prometheus#17435)

* Delay compactions until Thanos uploads all blocks

Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled.

The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* Rename structs and functions to make this more generic

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* Address review comments

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* Cache UploadMeta for 1 minute

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

---------

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* RW2: Allow custom scope in azuread (prometheus#17483)

Signed-off-by: Ben Edmunds <sammybenblue2@gmail.com>

* docs: Describe how time() is set to start at 0 in unit tests

The return value of functions relating to the current time, e.g. time(),
is set by promtool to start at timestamp 0 at the start of a test's
evaluation.

This has the very nice consequence that tests can run reliably without
depending on when they are run.

It does, however, mean that tests will give out results that can be
unexpected by users.

If this behaviour is documented, then users will be empowered to write
tests for their rules that use time-dependent functions.

(Closes: prometheus/docs#1464)

Signed-off-by: Gabriel Filion <lelutin@torproject.org>

* refactor(tsdb): use one test newTestDB constructor (prometheus#17638)

For tests only, we had various ways of opening DB. Reduced to one
instead of:

* Open
* newTestDB
* newTestDBOpts
* openTestDB

This so prometheus#17629 is smaller
and bit easier. Also for test maintainability and consistency.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Add start_timestamp field for unit tests.

This commit adds support for configuring a custom start timestamp
for Prometheus unit tests, allowing tests to use realistic timestamps
instead of starting at Unix epoch 0.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>

* Fix serialization for empty `ignoring()` in combination with `group_x()`

Currently both the backend and frontend printers/formatters/serializers
incorrectly transform the following expression:

```
up * ignoring() group_left(__name__) node_boot_time_seconds
```

...into:

```
up * node_boot_time_seconds
```

...which yields a different result (including the metric name in the result
vs. no metric name).

We need to keep empty `ignoring()` modifiers if there is a grouping modifier
present.

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Simplify StartTime assignment in unit test setup.

Remove redundant IsZero check since promqltest.LazyLoader already
handles zero StartTime by defaulting to Unix epoch.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>

* Update golangci-lint and add modernize check (prometheus#17640)

* add modernize check

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

* fix golangci lint

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

---------

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

* fix lint

---------

Signed-off-by: Mohammad Alavi <m.alavi1986@gmail.com>
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>
Signed-off-by: SuperQ <superq@gmail.com>
Signed-off-by: zjumathcode <pai314159@2980.com>
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Ben Edmunds <sammybenblue2@gmail.com>
Signed-off-by: Gabriel Filion <lelutin@torproject.org>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: dongjiang1989 <dongjiang1989@126.com>
Co-authored-by: Mohammad Alavi <m.alavi1986@gmail.com>
Co-authored-by: Jan Fajerski <jfajersk@redhat.com>
Co-authored-by: Jan Fajerski <jan--f@users.noreply.github.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Jorge Turrado <jorge.turrado@mail.schwarz>
Co-authored-by: Ben Kochie <superq@gmail.com>
Co-authored-by: zjumathcode <pai314159@2980.com>
Co-authored-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Co-authored-by: Ben Edmunds <Tigger2014@users.noreply.github.com>
Co-authored-by: Julien <291750+roidelapluie@users.noreply.github.com>
Co-authored-by: Gabriel Filion <lelutin@torproject.org>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
Co-authored-by: dongjiang <dongjiang1989@126.com>
Co-authored-by: Jeanette Tan <jeanette.tan@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants