Skip to content

fix: aggregate Available condition across Prometheus shards#8434

Merged
simonpasquier merged 2 commits intoprometheus-operator:mainfrom
simonpasquier:refactor-sts-reporter
Mar 11, 2026
Merged

fix: aggregate Available condition across Prometheus shards#8434
simonpasquier merged 2 commits intoprometheus-operator:mainfrom
simonpasquier:refactor-sts-reporter

Conversation

@simonpasquier
Copy link
Copy Markdown
Contributor

@simonpasquier simonpasquier commented Mar 9, 2026

Description

This commit modifies the status reporter for Prometheus and PrometheusAgent resources to aggregate the status of individual shards into the resource's Available condition.

The logic is as follows (first match wins):

  • if any shard is Available=False => the resource's Available condition is False.
  • if any shard is Available=Degraded => the resource's Available condition is Degraded.
  • if any shard is Available=Unknown => the resource's Available condition is Unknown.
  • Otherwise the resource's Available condition is True.

Before this change, the resource's Available condition was non-deterministic when shards reported different availability conditions.

Type of change

What type of changes does your code introduce to the Prometheus operator? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Verification

Please check the Prometheus-Operator testing guidelines for recommendations about automated tests.

Changelog entry

Please put a one-line changelog entry below. This will be copied to the changelog file during the release process.


@simonpasquier simonpasquier requested a review from a team as a code owner March 9, 2026 12:52
@simonpasquier simonpasquier requested a review from slashpai March 9, 2026 12:53
@simonpasquier simonpasquier force-pushed the refactor-sts-reporter branch from 18f1583 to 80ea71c Compare March 9, 2026 13:39
@simonpasquier simonpasquier enabled auto-merge March 9, 2026 14:16
@simonpasquier simonpasquier force-pushed the refactor-sts-reporter branch from 80ea71c to ac29534 Compare March 10, 2026 09:33
This commit modifies the status reporter for Prometheus and
PrometheusAgent resources to aggregate the status of individual shards
into the resource's Available condition.

The logic is as follows (first match wins):
* if any shard is Available=False => the resource's Available condition
  is False.
* if any shard is Available=Degraded => the resource's Available condition
  is Degraded.
* if any shard is Available=Unknown => the resource's Available condition
  is Unknown.
* Otherwise the resource's Available condition is True.

Before this change, the resource's Available condition was
non-deterministic when shards reported different availability
conditions.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
@simonpasquier simonpasquier force-pushed the refactor-sts-reporter branch from ac29534 to a3802fa Compare March 10, 2026 10:54
AvailableReplicas: 3,
UnavailableReplicas: 1,
},
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a case where (replicas=2, 1 ready pod) and another shard where no ready pod) to simulate Degraded and False together?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's covered by TestCombinedStatus() I believe but I can add 1 test here if you want.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok then not needed :)

@simonpasquier simonpasquier merged commit ef54c56 into prometheus-operator:main Mar 11, 2026
25 of 29 checks passed
@simonpasquier simonpasquier deleted the refactor-sts-reporter branch March 11, 2026 13:10
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Mar 20, 2026
…r to v0.90.0 (#4885)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [prometheus-operator/prometheus-operator](https://github.com/prometheus-operator/prometheus-operator) | minor | `v0.89.0` → `v0.90.0` |

---

> ⚠️ **Warning**
>
> Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/2) for more information.

---

### Release Notes

<details>
<summary>prometheus-operator/prometheus-operator (prometheus-operator/prometheus-operator)</summary>

### [`v0.90.0`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.90.0): 0.90.0 / 2026-03-19

[Compare Source](prometheus-operator/prometheus-operator@v0.89.0...v0.90.0)

- \[CHANGE/BUGFIX] Validate that the remote-write URL scheme is either `http` or `https`. [#&#8203;8455](prometheus-operator/prometheus-operator#8455)
- \[FEATURE] Add `--repair-policy-for-statefulsets` CLI argument to the operator. It defines how the operator manages StatefulSet's pods stuck at an incorrect revision. Users running Kubernetes v1.35+ are encouraged to enable this feature (see [troubleshooting guide](https://prometheus-operator.dev/docs/platform/troubleshooting/#statefulset-rollout-stuck-after-a-bad-update)). [#&#8203;8443](prometheus-operator/prometheus-operator#8443)
- \[FEATURE] Add `schedulerName` support to the `Prometheus`, `PrometheusAgent`, `Alertmanager` and `ThanosRuler` CRDs. [#&#8203;8451](prometheus-operator/prometheus-operator#8451)
- \[ENHANCEMENT] Add `--web.tls-curves` CLI argument to the operator and admission-webhook binaries. [#&#8203;8385](prometheus-operator/prometheus-operator#8385)
- \[ENHANCEMENT] Support minimum TLS version for Thanos gRPC servers. [#&#8203;8438](prometheus-operator/prometheus-operator#8438)
- \[ENHANCEMENT] Add version label to `ThanosRuler` pods. [#&#8203;8441](prometheus-operator/prometheus-operator#8441)
- \[ENHANCEMENT] Add `messageText` support for Slack receiver in `AlertmanagerConfig` CRD. [#&#8203;8374](prometheus-operator/prometheus-operator#8374)
- \[ENHANCEMENT] Add `messageText` support for Slack receiver in Alertmanager secret config. [#&#8203;8375](prometheus-operator/prometheus-operator#8375)
- \[ENHANCEMENT] Add `forceImplicitTLS` support for SMTP email config in Alertmanager secret config. [#&#8203;8384](prometheus-operator/prometheus-operator#8384) [#&#8203;8404](prometheus-operator/prometheus-operator#8404)
- \[ENHANCEMENT] Add `forceImplicitTLS` support for SMTP email config in `AlertmanagerConfig` CRD. [#&#8203;8386](prometheus-operator/prometheus-operator#8386)
- \[ENHANCEMENT] Add `forceImplicitTLS` support for SMTP global config in Alertmanager secret config. [#&#8203;8405](prometheus-operator/prometheus-operator#8405)
- \[ENHANCEMENT] Add `forceImplicitTLS` support for SMTP global config in `Alertmanager` CRD. [#&#8203;8406](prometheus-operator/prometheus-operator#8406)
- \[ENHANCEMENT] Add support for global Telegram bot token in `Alertmanager` CRD. [#&#8203;8372](prometheus-operator/prometheus-operator#8372)
- \[ENHANCEMENT] Add `chatIDFile` support for Telegram receiver in Alertmanager secret config. [#&#8203;8376](prometheus-operator/prometheus-operator#8376)
- \[ENHANCEMENT] Add `wechatAPISecretFile` support in Alertmanager global config. [#&#8203;8377](prometheus-operator/prometheus-operator#8377)
- \[ENHANCEMENT] Add `authSecretFile` support for email config in Alertmanager secret config. [#&#8203;8396](prometheus-operator/prometheus-operator#8396)
- \[ENHANCEMENT] Add nested field support for PagerDuty description in Alertmanager secret config. [#&#8203;8402](prometheus-operator/prometheus-operator#8402)
- \[ENHANCEMENT] Add email threading support in Alertmanager secret config. [#&#8203;8388](prometheus-operator/prometheus-operator#8388)
- \[ENHANCEMENT] Add field and label selectors for ConfigMap watches. [#&#8203;8368](prometheus-operator/prometheus-operator#8368)
- \[ENHANCEMENT] Improve ScrapeConfig API consistency and validation. [#&#8203;8422](prometheus-operator/prometheus-operator#8422)
- \[BUGFIX] Fix `ThanosRuler` config resource status not being updated on initial StatefulSet creation. [#&#8203;8358](prometheus-operator/prometheus-operator#8358)
- \[BUGFIX] Preserve `LastTransitionTime` in Prometheus status conditions. [#&#8203;8346](prometheus-operator/prometheus-operator#8346)
- \[BUGFIX] Make Mattermost `text` field optional in `AlertmanagerConfig` CRD. [#&#8203;8363](prometheus-operator/prometheus-operator#8363)
- \[BUGFIX] Remove nil error wrapping in v1alpha1 duplicate receiver validation. [#&#8203;8379](prometheus-operator/prometheus-operator#8379)
- \[BUGFIX] Aggregate `Available` condition across Prometheus shards. [#&#8203;8434](prometheus-operator/prometheus-operator#8434)
- \[BUGFIX] Reconcile resources with inconsistent status. [#&#8203;8397](prometheus-operator/prometheus-operator#8397)
- \[BUGFIX] Fix namespace lister/watcher compatibility with Kubernetes v1.35 client-go. [#&#8203;8431](prometheus-operator/prometheus-operator#8431)
- \[BUGFIX] Fix missing OAuth2 field in IonosSDConfig generation. [#&#8203;8433](prometheus-operator/prometheus-operator#8433)
- \[BUGFIX] Fix missing fields in AzureSDConfig. [#&#8203;8444](prometheus-operator/prometheus-operator#8444)
- \[BUGFIX] Validate Microsoft Teams V2 URL in `AlertmanagerConfig` CRD. [#&#8203;8227](prometheus-operator/prometheus-operator#8227)
- \[BUGFIX] Fix `labelmap` relabel action rejecting valid replacement values with template variables for Prometheus 2.x. [#&#8203;8337](prometheus-operator/prometheus-operator#8337)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41OS4yIiwidXBkYXRlZEluVmVyIjoiNDMuNTkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/4885
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants