Skip to content

### Fix: Clear Stale Reconciliation Status in Alertmanager, Thanos, and PrometheusAgent Controllers#8304

Merged
simonpasquier merged 1 commit intoprometheus-operator:mainfrom
Yashika0724:fix/reset-reconciliation-status
Jan 21, 2026
Merged

### Fix: Clear Stale Reconciliation Status in Alertmanager, Thanos, and PrometheusAgent Controllers#8304
simonpasquier merged 1 commit intoprometheus-operator:mainfrom
Yashika0724:fix/reset-reconciliation-status

Conversation

@Yashika0724
Copy link
Contributor

Summary

This PR fixes an issue where the Reconciled condition’s Reason and Message fields could remain
stale in Alertmanager, Thanos, and PrometheusAgent controllers even after reconciliation
succeeds.

These controllers were missing a call to reset reconciliation status at the beginning of each
sync cycle, causing old reason/message values to persist across reconciliations. This led to
misleading status conditions that did not reflect the current state of the workload.

This change aligns all controllers with the correct behavior already used by the Prometheus
Server controller.


Root Cause

Reconciliation status is tracked using ReconciliationTracker, which maintains:

  • an error field for reconciliation result
  • reason and message fields for contextual status

In Alertmanager, Thanos, and PrometheusAgent controllers, only the error field was updated after
each reconciliation. Since the previous reason and message were never cleared, they remained
even when the underlying issue was resolved.

The Prometheus Server controller correctly resets reconciliation status at the start of each
sync cycle, but the other three controllers were missing this reset step.


Steps to Reproduce

  1. Create a resource that initially has no selected configuration or uses deprecated fields
  2. Wait for reconciliation and observe that a specific status reason is set
  3. Update the configuration so reconciliation should now succeed
  4. Wait for the next reconciliation cycle
  5. Observe that the error is cleared but the previous reason/message remains unchanged

This behavior occurs in Alertmanager, ThanosRuler, and PrometheusAgent resources.


Fix Applied

A call to reset reconciliation status is added at the start of Sync() in:

  • Alertmanager controller
  • Thanos controller
  • PrometheusAgent controller

All controllers now follow the same lifecycle as the Prometheus Server controller:

  1. Reset previous reconciliation status
  2. Perform reconciliation
  3. Set final reconciliation result

This ensures that reason and message fields always reflect the current reconciliation cycle.


Impact

  • Prevents stale and misleading status conditions
  • Ensures accurate reporting of reconciliation state
  • Makes controller behavior consistent across all components
  • Improves reliability of monitoring and alerting
  • No change to reconciliation logic or workload behavior

Testing

  • Pattern verified against Prometheus Server controller behavior
  • No functional changes introduced to reconciliation flow
  • CI will validate integration behavior

…nd PrometheusAgent

Signed-off-by: Yashika0724 <ssyashika1311@gmail.com>
@Yashika0724 Yashika0724 requested a review from a team as a code owner January 21, 2026 11:07
@Yashika0724
Copy link
Contributor Author

Hello @simonpasquier , this PR aligns Alertmanager, Thanos, and PrometheusAgent controllers with
the Prometheus Server controller by resetting reconciliation status at the start of each sync
cycle to prevent stale reason/message fields. I’d really appreciate a review when convenient.
Thank you!

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! There was no bug in the Alertmanager controller because it doesn't call SetReasonAndMessage() for now but it's better to align with the rest.

@simonpasquier simonpasquier merged commit f15a484 into prometheus-operator:main Jan 21, 2026
22 checks passed
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Feb 6, 2026
…r to v0.89.0 (#3775)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [prometheus-operator/prometheus-operator](https://github.com/prometheus-operator/prometheus-operator) | minor | `v0.88.1` → `v0.89.0` |

---

### Release Notes

<details>
<summary>prometheus-operator/prometheus-operator (prometheus-operator/prometheus-operator)</summary>

### [`v0.89.0`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.89.0): 0.89.0 / 2026-02-05

[Compare Source](prometheus-operator/prometheus-operator@v0.88.1...v0.89.0)

- \[ENHANCEMENT] Add `hostNetwork` field to the `Alertmanager` CRD. [#&#8203;8281](prometheus-operator/prometheus-operator#8281)
- \[ENHANCEMENT] Add the `crds` and `full-crds` commands to the operator's binary. [#&#8203;8251](prometheus-operator/prometheus-operator#8251)
- \[ENHANCEMENT] Report deprecated field usage in the `Reconciled` condition type. [#&#8203;8236](prometheus-operator/prometheus-operator#8236)
- \[ENHANCEMENT] Avoid unnecessary reconciliation upon creation of the `ThanosRuler` StatefulSet. [#&#8203;8347](prometheus-operator/prometheus-operator#8347)
- \[ENHANCEMENT] Add `bodySizeLimit` to the ScrapeConfig CRD. [#&#8203;8348](prometheus-operator/prometheus-operator#8348)
- \[ENHANCEMENT] Support `http_headers` field in the Alertmanager Secret. [#&#8203;8357](prometheus-operator/prometheus-operator#8357)
- \[ENHANCEMENT] Add the `-kubelet-http-metrics` flag to enable/disable the HTTP metrics port in the Kubelet endpoint (default=enabled). [#&#8203;8350](prometheus-operator/prometheus-operator#8350)
- \[ENHANCEMENT] Include `operator.prometheus.io/version` annotation in the full version of CRDs. [#&#8203;8279](prometheus-operator/prometheus-operator#8279)
- \[BUGFIX] Validate VictorOps global configuration in the `Alertmanager` CRD. [#&#8203;8020](prometheus-operator/prometheus-operator#8020)
- \[BUGFIX] Validate Jira global configuration in the `Alertmanager` CRD. [#&#8203;8265](prometheus-operator/prometheus-operator#8265)
- \[BUGFIX] Validate VictorOps receiver's URL in the `AlertmanagerConfig` CRD. [#&#8203;8258](prometheus-operator/prometheus-operator#8258)
- \[BUGFIX] Validate Webex receiver's URL in the `AlertmanagerConfig` CRD. [#&#8203;8255](prometheus-operator/prometheus-operator#8255)
- \[BUGFIX] Validate Jira receiver's URL configuration in the `AlertmanagerConfig` CRD. [#&#8203;8230](prometheus-operator/prometheus-operator#8230)
- \[BUGFIX] Validate OpsGenie receiver configuration in the `AlertmanagerConfig` CRD. [#&#8203;8267](prometheus-operator/prometheus-operator#8267)
- \[BUGFIX] Validate WeChat receiver configuration in the `AlertmanagerConfig` CRD. [#&#8203;8271](prometheus-operator/prometheus-operator#8271)
- \[BUGFIX] Validate SNS receiver configuration in the `AlertmanagerConfig` CRD. [#&#8203;8217](prometheus-operator/prometheus-operator#8217)
- \[BUGFIX] Validate Webex global configuration in the `Alertmanager` CRD. [#&#8203;7979](prometheus-operator/prometheus-operator#7979)
- \[BUGFIX] Validate Telegram global configuration in the `Alertmanager` CRD. [#&#8203;8268](prometheus-operator/prometheus-operator#8268)
- \[BUGFIX] Restore statefulset's labels if the creation fails with AlreadyExists. [#&#8203;8343](prometheus-operator/prometheus-operator#8343)
- \[BUGFIX] Fix potential panic due to informer cache races. [#&#8203;8310](prometheus-operator/prometheus-operator#8310)
- \[BUGFIX] Support probers defined with IPv6 addresses in the `Probe` CRD. [#&#8203;8354](prometheus-operator/prometheus-operator#8354)
- \[BUGFIX] Prevent group and repeat intervals with zero duration from breaking Alertmanager. [#&#8203;8126](prometheus-operator/prometheus-operator#8126)
- \[BUGFIX] Propagate all supported RocketChat attributes for `AlertmanagerConfig` CRD. [#&#8203;8016](prometheus-operator/prometheus-operator#8016)
- \[BUGFIX] Add URL validation for WeChat receiver. [#&#8203;8256](prometheus-operator/prometheus-operator#8256)
- \[BUGFIX] Add URL validation for SNS receiver. [#&#8203;8259](prometheus-operator/prometheus-operator#8259)
- \[BUGFIX] Fix GCE service discovery for the `ScrapeConfig` CRD. [#&#8203;8284](prometheus-operator/prometheus-operator#8284)
- \[BUGFIX] Avoid stale conditions in `Alertmanager`, `ThanosRuler`, `Prometheus` and `PrometheusAgent` resources. [#&#8203;8304](prometheus-operator/prometheus-operator#8304)
- \[BUGFIX] Fix race condition when updating rule ConfigMaps. [#&#8203;8290](prometheus-operator/prometheus-operator#8290)
- \[BUGFIX] Fix race condition when patching finalizers. [#&#8203;8323](prometheus-operator/prometheus-operator#8323)
- \[BUGFIX] Reconcile `ScrapeConfig` resources when namespace selection changes. [#&#8203;8334](prometheus-operator/prometheus-operator#8334)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4zLjYiLCJ1cGRhdGVkSW5WZXIiOiI0My4zLjYiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbImltYWdlIl19-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/3775
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Feb 21, 2026
…r to v0.89.0

##### [\`v0.89.0\`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.89.0)

- \[ENHANCEMENT] Add `hostNetwork` field to the `Alertmanager` CRD. [#8281](prometheus-operator/prometheus-operator#8281)
- \[ENHANCEMENT] Add the `crds` and `full-crds` commands to the operator's binary. [#8251](prometheus-operator/prometheus-operator#8251)
- \[ENHANCEMENT] Report deprecated field usage in the `Reconciled` condition type. [#8236](prometheus-operator/prometheus-operator#8236)
- \[ENHANCEMENT] Avoid unnecessary reconciliation upon creation of the `ThanosRuler` StatefulSet. [#8347](prometheus-operator/prometheus-operator#8347)
- \[ENHANCEMENT] Add `bodySizeLimit` to the ScrapeConfig CRD. [#8348](prometheus-operator/prometheus-operator#8348)
- \[ENHANCEMENT] Support `http_headers` field in the Alertmanager Secret. [#8357](prometheus-operator/prometheus-operator#8357)
- \[ENHANCEMENT] Add the `-kubelet-http-metrics` flag to enable/disable the HTTP metrics port in the Kubelet endpoint (default=enabled). [#8350](prometheus-operator/prometheus-operator#8350)
- \[ENHANCEMENT] Include `operator.prometheus.io/version` annotation in the full version of CRDs. [#8279](prometheus-operator/prometheus-operator#8279)
- \[BUGFIX] Validate VictorOps global configuration in the `Alertmanager` CRD. [#8020](prometheus-operator/prometheus-operator#8020)
- \[BUGFIX] Validate Jira global configuration in the `Alertmanager` CRD. [#8265](prometheus-operator/prometheus-operator#8265)
- \[BUGFIX] Validate VictorOps receiver's URL in the `AlertmanagerConfig` CRD. [#8258](prometheus-operator/prometheus-operator#8258)
- \[BUGFIX] Validate Webex receiver's URL in the `AlertmanagerConfig` CRD. [#8255](prometheus-operator/prometheus-operator#8255)
- \[BUGFIX] Validate Jira receiver's URL configuration in the `AlertmanagerConfig` CRD. [#8230](prometheus-operator/prometheus-operator#8230)
- \[BUGFIX] Validate OpsGenie receiver configuration in the `AlertmanagerConfig` CRD. [#8267](prometheus-operator/prometheus-operator#8267)
- \[BUGFIX] Validate WeChat receiver configuration in the `AlertmanagerConfig` CRD. [#8271](prometheus-operator/prometheus-operator#8271)
- \[BUGFIX] Validate SNS receiver configuration in the `AlertmanagerConfig` CRD. [#8217](prometheus-operator/prometheus-operator#8217)
- \[BUGFIX] Validate Webex global configuration in the `Alertmanager` CRD. [#7979](prometheus-operator/prometheus-operator#7979)
- \[BUGFIX] Validate Telegram global configuration in the `Alertmanager` CRD. [#8268](prometheus-operator/prometheus-operator#8268)
- \[BUGFIX] Restore statefulset's labels if the creation fails with AlreadyExists. [#8343](prometheus-operator/prometheus-operator#8343)
- \[BUGFIX] Fix potential panic due to informer cache races. [#8310](prometheus-operator/prometheus-operator#8310)
- \[BUGFIX] Support probers defined with IPv6 addresses in the `Probe` CRD. [#8354](prometheus-operator/prometheus-operator#8354)
- \[BUGFIX] Prevent group and repeat intervals with zero duration from breaking Alertmanager. [#8126](prometheus-operator/prometheus-operator#8126)
- \[BUGFIX] Propagate all supported RocketChat attributes for `AlertmanagerConfig` CRD. [#8016](prometheus-operator/prometheus-operator#8016)
- \[BUGFIX] Add URL validation for WeChat receiver. [#8256](prometheus-operator/prometheus-operator#8256)
- \[BUGFIX] Add URL validation for SNS receiver. [#8259](prometheus-operator/prometheus-operator#8259)
- \[BUGFIX] Fix GCE service discovery for the `ScrapeConfig` CRD. [#8284](prometheus-operator/prometheus-operator#8284)
- \[BUGFIX] Avoid stale conditions in `Alertmanager`, `ThanosRuler`, `Prometheus` and `PrometheusAgent` resources. [#8304](prometheus-operator/prometheus-operator#8304)
- \[BUGFIX] Fix race condition when updating rule ConfigMaps. [#8290](prometheus-operator/prometheus-operator#8290)
- \[BUGFIX] Fix race condition when patching finalizers. [#8323](prometheus-operator/prometheus-operator#8323)
- \[BUGFIX] Reconcile `ScrapeConfig` resources when namespace selection changes. [#8334](prometheus-operator/prometheus-operator#8334)

---
##### [\`v0.88.1\`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.88.1)

- \[BUGFIX] Validate `webhookURL` secret for `MSTeams` receiver in `AlertmanagerConfig` CRD. [#8294](prometheus-operator/prometheus-operator#8294)
- \[BUGFIX] Revert maximum version check for `EC2/Lightsail` SD in `ScrapeConfig` CRD. [#8308](prometheus-operator/prometheus-operator#8308)
- \[BUGFIX] Relax URL validation in `Slack` receiver in AlertmanagerConfig CRD to support Go templates. [#8299](prometheus-operator/prometheus-operator#8299) [#8331](prometheus-operator/prometheus-operator#8331)
- \[BUGFIX] Relax URL validation in `PagerDuty` in AlertmanagerConfig CRD to support Go templates. [#8319](prometheus-operator/prometheus-operator#8319)
- \[BUGFIX] Relax URL validation in `WebhookConfig` in AlertmanagerConfig CRD to support Go templates. [#8307](prometheus-operator/prometheus-operator#8307) [#8317](prometheus-operator/prometheus-operator#8317)
- \[BUGFIX] Relax URL validation in `RocketChat` receiver in AlertmanagerConfig CRD to support Go templates. [#8318](prometheus-operator/prometheus-operator#8318)
- \[BUGFIX] Relax URL validation in `Pushover` receiver in AlertmanagerConfig CRD to support Go templates. [#8307](prometheus-operator/prometheus-operator#8307) [#8316](prometheus-operator/prometheus-operator#8316)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants