TSDB: Option to configure TSDB Block Reload Interval#16728
TSDB: Option to configure TSDB Block Reload Interval#16728aknuds1 merged 18 commits intoprometheus:mainfrom
Conversation
70e7311 to
aaee3bd
Compare
|
Hey @jesusvazquez could you take a look at this, and suggest any changes if required? |
|
Is it possible to add a feature where block reloads by invoking some functions. thinking in terms of event driven architecture. |
|
@HawkingRadiation42 Yep i was working on it will push it this weekend |
This is a work around for optimizing the reload rather than 1min |
bboreham
left a comment
There was a problem hiding this comment.
Hello from the bug-scrub! Thanks for this, and sorry for taking a while to review.
|
Hey @bboreham, I will plan and refactor this PR in next few days |
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
e593fbe to
1c34396
Compare
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
aknuds1
left a comment
There was a problem hiding this comment.
The changelog item overlaps with the PR description's release notes.
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
|
Hey @aknuds1 sry for delays had my exams going on, I have refactored to use require.Eventually() and drop sleep Also changed the PR description to latest. PTAL! |
|
Taking another look. |
aknuds1
left a comment
There was a problem hiding this comment.
Please see comments, there's one misleadingly named test case and one test case that doesn't really make sense.
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
9f367bc to
07aa8db
Compare
aknuds1
left a comment
There was a problem hiding this comment.
openTestDB is gone in main.
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Signed-off-by: NamanParlecha <namanparlecha@gmail.com>
aknuds1
left a comment
There was a problem hiding this comment.
I think it looks good now, except a verification of block reload interval clamping could be added to the test logic.
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: NamanParlecha <namanparlecha@gmail.com>
aknuds1
left a comment
There was a problem hiding this comment.
LGTM, thanks for applying my feedback.
|
@aknuds1 done, Thanks for helping out!! |
|
I've asked @roidelapluie about fuzzing being broken. I see he has a PR open to move to native fuzzing, I'm wondering whether that's all it takes to unbreak fuzzing. |
|
I think it's safe to merge this even though fuzzing doesn't go through, because it's broken for other reasons than changes in this PR. |
|
Hi, I tweaked the release note: After: (I don't think a flag to tweak an existing feature counts as a FEATURE) |
|
Thanks @bboreham - good point. |
##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0) #### Note for users of Native Histograms In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect. You must now turn on the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters. #### Changelog - \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528) - \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647) - \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427) - \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636) - \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409) - \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435) - \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714) - \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728) - \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565) - \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834) - \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374) - \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637) - \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483) - \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138) - \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477) - \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561) - \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439) - \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530) - \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519). - \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682) - \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602) - \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605) - \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678) - \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643) - \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479) - \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668) - \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708) - \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703) - \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161) - \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608) - \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559) - \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533) - \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561) - \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546) - \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)
##### [\`v3.9.1\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.1) - \[BUGFIX] Agent: fix crash shortly after startup from invalid type of object. [#17802](prometheus/prometheus#17802) - \[BUGFIX] Scraping: fix relabel keep/drop not working. [#17807](prometheus/prometheus#17807) --- ##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0) #### Note for users of Native Histograms In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect. You must now turn on the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters. #### Changelog - \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528) - \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647) - \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427) - \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636) - \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409) - \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435) - \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714) - \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728) - \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565) - \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834) - \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374) - \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637) - \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483) - \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138) - \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477) - \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561) - \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439) - \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530) - \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519). - \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682) - \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602) - \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605) - \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678) - \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643) - \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479) - \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668) - \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708) - \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703) - \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161) - \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608) - \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559) - \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533) - \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561) - \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546) - \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)
##### [\`v3.9.1\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.1) - \[BUGFIX] Agent: fix crash shortly after startup from invalid type of object. [#17802](prometheus/prometheus#17802) - \[BUGFIX] Scraping: fix relabel keep/drop not working. [#17807](prometheus/prometheus#17807) --- ##### [\`v3.9.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.9.0) #### Note for users of Native Histograms In version 3.9, Native Histograms is no longer experimental, and the feature flag `native-histogram` has no effect. You must now turn on the config setting `scrape_native_histograms` to collect Native Histogram samples from exporters. #### Changelog - \[CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. [#17528](prometheus/prometheus#17528) - \[CHANGE] API: Add maximum limit of 10,000 sets of statistics to TSDB status endpoint. [#17647](prometheus/prometheus#17647) - \[FEATURE] API: Add /api/v1/features for clients to understand which features are supported. [#17427](prometheus/prometheus#17427) - \[FEATURE] Promtool: Add `start_timestamp` field for unit tests. [#17636](prometheus/prometheus#17636) - \[FEATURE] Promtool: Add `--format seriesjson` option to `tsdb dump` to output just series labels in JSON format. [#13409](prometheus/prometheus#13409) - \[FEATURE] Add `--storage.tsdb.delay-compact-file.path` flag for better interoperability with Thanos. [#17435](prometheus/prometheus#17435) - \[FEATURE] UI: Add an option on the query drop-down menu to duplicate that query panel. [#17714](prometheus/prometheus#17714) - \[ENHANCEMENT]: TSDB: add flag `--storage.tsdb.block-reload-interval` to configure TSDB Block Reload Interval. [#16728](prometheus/prometheus#16728) - \[ENHANCEMENT] UI: Add graph option to start the chart's Y axis at zero. [#17565](prometheus/prometheus#17565) - \[ENHANCEMENT] Scraping: Classic protobuf format no longer requires the unit in the metric name. [#16834](prometheus/prometheus#16834) - \[ENHANCEMENT] PromQL, Rules, SD, Scraping: Add native histograms to complement existing summaries. [#17374](prometheus/prometheus#17374) - \[ENHANCEMENT] Notifications: Add a histogram `prometheus_notifications_latency_histogram_seconds` to complement the existing summary. [#16637](prometheus/prometheus#16637) - \[ENHANCEMENT] Remote-write: Add custom scope support for AzureAD authentication. [#17483](prometheus/prometheus#17483) - \[ENHANCEMENT] SD: add a `config` label with job name for most `prometheus_sd_refresh` metrics. [#17138](prometheus/prometheus#17138) - \[ENHANCEMENT] TSDB: New histogram `prometheus_tsdb_sample_ooo_delta`, the distribution of out-of-order samples in seconds. Collected for all samples, accepted or not. [#17477](prometheus/prometheus#17477) - \[ENHANCEMENT] Remote-read: Validate histograms received via remote-read. [#17561](prometheus/prometheus#17561) - \[PERF] TSDB: Small optimizations to postings index. [#17439](prometheus/prometheus#17439) - \[PERF] Scraping: Speed up relabelling of series. [#17530](prometheus/prometheus#17530) - \[PERF] PromQL: Small optimisations in binary operators. [#17524](prometheus/prometheus#17524), [#17519](prometheus/prometheus#17519). - \[BUGFIX] UI: PromQL autocomplete now shows the correct type and HELP text for OpenMetrics counters whose samples end in `_total`. [#17682](prometheus/prometheus#17682) - \[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions after the closing curly brace of a vector selector. [#17602](prometheus/prometheus#17602) - \[BUGFIX] UI: Query editor no longer suggests a duration unit if one is already present after a number. [#17605](prometheus/prometheus#17605) - \[BUGFIX] PromQL: Fix some "vector cannot contain metrics with the same labelset" errors when experimental delayed name removal is enabled. [#17678](prometheus/prometheus#17678) - \[BUGFIX] PromQL: Fix possible corruption of PromQL text if the query had an empty `ignoring()` and non-empty grouping. [#17643](prometheus/prometheus#17643) - \[BUGFIX] PromQL: Fix resets/changes to return empty results for anchored selectors when all samples are outside the range. [#17479](prometheus/prometheus#17479) - \[BUGFIX] PromQL: Check more consistently for many-to-one matching in filter binary operators. [#17668](prometheus/prometheus#17668) - \[BUGFIX] PromQL: Fix collision in unary negation with non-overlapping series. [#17708](prometheus/prometheus#17708) - \[BUGFIX] PromQL: Fix collision in label\_join and label\_replace with non-overlapping series. [#17703](prometheus/prometheus#17703) - \[BUGFIX] PromQL: Fix bug with inconsistent results for queries with OR expression when experimental delayed name removal is enabled. [#17161](prometheus/prometheus#17161) - \[BUGFIX] PromQL: Ensure that `rate`/`increase`/`delta` of histograms results in a gauge histogram. [#17608](prometheus/prometheus#17608) - \[BUGFIX] PromQL: Do not panic while iterating over invalid histograms. [#17559](prometheus/prometheus#17559) - \[BUGFIX] TSDB: Reject chunk files whose encoded chunk length overflows int. [#17533](prometheus/prometheus#17533) - \[BUGFIX] TSDB: Do not panic during resolution reduction of invalid histograms. [#17561](prometheus/prometheus#17561) - \[BUGFIX] Remote-write Receive: Avoid duplicate labels when experimental type-and-unit-label feature is enabled. [#17546](prometheus/prometheus#17546) - \[BUGFIX] OTLP Receiver: Only write metadata to disk when experimental metadata-wal-records feature is enabled. [#17472](prometheus/prometheus#17472)
Work around for #16649
This PR adds a new TSDB flag
--storage.tsdb.block-reload-intervalthat allows users to customize how frequently TSDB blocks are reloaded, providing more control over block discovery and reload behavior.This is to Reduce wait time from up to 1 minute to configurable interval
Maintains default 1 minute behavior when not specified Allowing intervals from seconds to hours based on use case requirements and to reduce I/O overhead
The minimum value for this flag is 1s and default is 1m
Usage:
test:
Does this PR introduce a user-facing change?