Fix nil pointer panic in StatefulSet status reporting during informer cache races#8310
Conversation
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
|
This PR fixes a real nil pointer panic caused by informer cache races.
The fix restores a safe contract by returning a proper This preserves existing behavior while eliminating a production crash scenario. |
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
pkg/alertmanager/operator.go
Outdated
|
|
||
| // Defensive nil check: if informer returns nil without error | ||
| // (e.g., during cache inconsistency), treat as not found. | ||
| if obj == nil { |
There was a problem hiding this comment.
With the change to informers.go, is it now possible for Get() to return a nil value?
There was a problem hiding this comment.
With the proposed change to panic on empty informers, Get() should no longer return (nil, nil) in normal operation.
The defensive nil checks were added to guard against unexpected informer behavior and future regressions, but I’m happy to remove or scope them if you’d prefer relying solely on the stricter Get() contract.
There was a problem hiding this comment.
The new contract of *ForResource.Get() is that it never returns a nil pointer. A regression would be easier to detect if we don't do defensive programming.
There was a problem hiding this comment.
Thanks for the clarification.
I’ve removed the defensive nil checks and updated the code to fully rely on the ForResource.Get() contract that it never returns a nil object. This should make any regression fail fast and be easier to detect.
Please let me know if there’s anything else you’d like adjusted.
Signed-off-by: WHOIM1205 <WHOIM1205@users.noreply.github.com>
Signed-off-by: WHOIM1205 <WHOIM1205@users.noreply.github.com>
pkg/alertmanager/operator.go
Outdated
|
|
||
| // Defensive nil check: if informer returns nil without error | ||
| // (e.g., during cache inconsistency), treat as not found. | ||
| if obj == nil { |
There was a problem hiding this comment.
The new contract of *ForResource.Get() is that it never returns a nil pointer. A regression would be easier to detect if we don't do defensive programming.
Signed-off-by: WHOIM1205 <WHOIM1205@users.noreply.github.com>
8028ade to
eefe614
Compare
Signed-off-by: WHOIM1205 <WHOIM1205@users.noreply.github.com>
0bc552e to
be0ad8b
Compare
pkg/informers/informers.go
Outdated
| return nil, apierrors.NewNotFound(schema.GroupResource{}, name) | ||
| } | ||
|
|
||
| var err error |
There was a problem hiding this comment.
reading the code again, we don't need to declare err here. If we loop over all the informers and couldn't find the key, we should just return a NotFound error.
There was a problem hiding this comment.
Good catch, thanks.
I’ve removed the unused err variable and simplified the logic so that after iterating over all informers, we directly return a NotFound error when the key isn’t found.
pkg/informers/informers.go
Outdated
| // Ensure we always return a NotFound error if the object wasn't found, | ||
| // even if err happens to be nil (which shouldn't happen, but be defensive). | ||
| if err == nil { | ||
| return nil, apierrors.NewNotFound(schema.GroupResource{}, name) |
There was a problem hiding this comment.
for consistency we should return the proper GroupResource.
There was a problem hiding this comment.
Thanks — I’ve updated the NotFound return to use the proper GroupResource for consistency.
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
| if apierrors.IsNotFound(err) { | ||
| continue | ||
| } | ||
| if err != nil { |
There was a problem hiding this comment.
we can directly return ret, err here
There was a problem hiding this comment.
Good point — I’ve simplified the code to directly return ret, err there.
ec1cb00 to
b0a0987
Compare
…r to v0.89.0 (#3775) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [prometheus-operator/prometheus-operator](https://github.com/prometheus-operator/prometheus-operator) | minor | `v0.88.1` → `v0.89.0` | --- ### Release Notes <details> <summary>prometheus-operator/prometheus-operator (prometheus-operator/prometheus-operator)</summary> ### [`v0.89.0`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.89.0): 0.89.0 / 2026-02-05 [Compare Source](prometheus-operator/prometheus-operator@v0.88.1...v0.89.0) - \[ENHANCEMENT] Add `hostNetwork` field to the `Alertmanager` CRD. [#​8281](prometheus-operator/prometheus-operator#8281) - \[ENHANCEMENT] Add the `crds` and `full-crds` commands to the operator's binary. [#​8251](prometheus-operator/prometheus-operator#8251) - \[ENHANCEMENT] Report deprecated field usage in the `Reconciled` condition type. [#​8236](prometheus-operator/prometheus-operator#8236) - \[ENHANCEMENT] Avoid unnecessary reconciliation upon creation of the `ThanosRuler` StatefulSet. [#​8347](prometheus-operator/prometheus-operator#8347) - \[ENHANCEMENT] Add `bodySizeLimit` to the ScrapeConfig CRD. [#​8348](prometheus-operator/prometheus-operator#8348) - \[ENHANCEMENT] Support `http_headers` field in the Alertmanager Secret. [#​8357](prometheus-operator/prometheus-operator#8357) - \[ENHANCEMENT] Add the `-kubelet-http-metrics` flag to enable/disable the HTTP metrics port in the Kubelet endpoint (default=enabled). [#​8350](prometheus-operator/prometheus-operator#8350) - \[ENHANCEMENT] Include `operator.prometheus.io/version` annotation in the full version of CRDs. [#​8279](prometheus-operator/prometheus-operator#8279) - \[BUGFIX] Validate VictorOps global configuration in the `Alertmanager` CRD. [#​8020](prometheus-operator/prometheus-operator#8020) - \[BUGFIX] Validate Jira global configuration in the `Alertmanager` CRD. [#​8265](prometheus-operator/prometheus-operator#8265) - \[BUGFIX] Validate VictorOps receiver's URL in the `AlertmanagerConfig` CRD. [#​8258](prometheus-operator/prometheus-operator#8258) - \[BUGFIX] Validate Webex receiver's URL in the `AlertmanagerConfig` CRD. [#​8255](prometheus-operator/prometheus-operator#8255) - \[BUGFIX] Validate Jira receiver's URL configuration in the `AlertmanagerConfig` CRD. [#​8230](prometheus-operator/prometheus-operator#8230) - \[BUGFIX] Validate OpsGenie receiver configuration in the `AlertmanagerConfig` CRD. [#​8267](prometheus-operator/prometheus-operator#8267) - \[BUGFIX] Validate WeChat receiver configuration in the `AlertmanagerConfig` CRD. [#​8271](prometheus-operator/prometheus-operator#8271) - \[BUGFIX] Validate SNS receiver configuration in the `AlertmanagerConfig` CRD. [#​8217](prometheus-operator/prometheus-operator#8217) - \[BUGFIX] Validate Webex global configuration in the `Alertmanager` CRD. [#​7979](prometheus-operator/prometheus-operator#7979) - \[BUGFIX] Validate Telegram global configuration in the `Alertmanager` CRD. [#​8268](prometheus-operator/prometheus-operator#8268) - \[BUGFIX] Restore statefulset's labels if the creation fails with AlreadyExists. [#​8343](prometheus-operator/prometheus-operator#8343) - \[BUGFIX] Fix potential panic due to informer cache races. [#​8310](prometheus-operator/prometheus-operator#8310) - \[BUGFIX] Support probers defined with IPv6 addresses in the `Probe` CRD. [#​8354](prometheus-operator/prometheus-operator#8354) - \[BUGFIX] Prevent group and repeat intervals with zero duration from breaking Alertmanager. [#​8126](prometheus-operator/prometheus-operator#8126) - \[BUGFIX] Propagate all supported RocketChat attributes for `AlertmanagerConfig` CRD. [#​8016](prometheus-operator/prometheus-operator#8016) - \[BUGFIX] Add URL validation for WeChat receiver. [#​8256](prometheus-operator/prometheus-operator#8256) - \[BUGFIX] Add URL validation for SNS receiver. [#​8259](prometheus-operator/prometheus-operator#8259) - \[BUGFIX] Fix GCE service discovery for the `ScrapeConfig` CRD. [#​8284](prometheus-operator/prometheus-operator#8284) - \[BUGFIX] Avoid stale conditions in `Alertmanager`, `ThanosRuler`, `Prometheus` and `PrometheusAgent` resources. [#​8304](prometheus-operator/prometheus-operator#8304) - \[BUGFIX] Fix race condition when updating rule ConfigMaps. [#​8290](prometheus-operator/prometheus-operator#8290) - \[BUGFIX] Fix race condition when patching finalizers. [#​8323](prometheus-operator/prometheus-operator#8323) - \[BUGFIX] Reconcile `ScrapeConfig` resources when namespace selection changes. [#​8334](prometheus-operator/prometheus-operator#8334) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4zLjYiLCJ1cGRhdGVkSW5WZXIiOiI0My4zLjYiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbImltYWdlIl19--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/3775 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
…er-nil-pointer-panic Fix nil pointer panic in StatefulSet status reporting during informer cache races
…r to v0.89.0 ##### [\`v0.89.0\`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.89.0) - \[ENHANCEMENT] Add `hostNetwork` field to the `Alertmanager` CRD. [#8281](prometheus-operator/prometheus-operator#8281) - \[ENHANCEMENT] Add the `crds` and `full-crds` commands to the operator's binary. [#8251](prometheus-operator/prometheus-operator#8251) - \[ENHANCEMENT] Report deprecated field usage in the `Reconciled` condition type. [#8236](prometheus-operator/prometheus-operator#8236) - \[ENHANCEMENT] Avoid unnecessary reconciliation upon creation of the `ThanosRuler` StatefulSet. [#8347](prometheus-operator/prometheus-operator#8347) - \[ENHANCEMENT] Add `bodySizeLimit` to the ScrapeConfig CRD. [#8348](prometheus-operator/prometheus-operator#8348) - \[ENHANCEMENT] Support `http_headers` field in the Alertmanager Secret. [#8357](prometheus-operator/prometheus-operator#8357) - \[ENHANCEMENT] Add the `-kubelet-http-metrics` flag to enable/disable the HTTP metrics port in the Kubelet endpoint (default=enabled). [#8350](prometheus-operator/prometheus-operator#8350) - \[ENHANCEMENT] Include `operator.prometheus.io/version` annotation in the full version of CRDs. [#8279](prometheus-operator/prometheus-operator#8279) - \[BUGFIX] Validate VictorOps global configuration in the `Alertmanager` CRD. [#8020](prometheus-operator/prometheus-operator#8020) - \[BUGFIX] Validate Jira global configuration in the `Alertmanager` CRD. [#8265](prometheus-operator/prometheus-operator#8265) - \[BUGFIX] Validate VictorOps receiver's URL in the `AlertmanagerConfig` CRD. [#8258](prometheus-operator/prometheus-operator#8258) - \[BUGFIX] Validate Webex receiver's URL in the `AlertmanagerConfig` CRD. [#8255](prometheus-operator/prometheus-operator#8255) - \[BUGFIX] Validate Jira receiver's URL configuration in the `AlertmanagerConfig` CRD. [#8230](prometheus-operator/prometheus-operator#8230) - \[BUGFIX] Validate OpsGenie receiver configuration in the `AlertmanagerConfig` CRD. [#8267](prometheus-operator/prometheus-operator#8267) - \[BUGFIX] Validate WeChat receiver configuration in the `AlertmanagerConfig` CRD. [#8271](prometheus-operator/prometheus-operator#8271) - \[BUGFIX] Validate SNS receiver configuration in the `AlertmanagerConfig` CRD. [#8217](prometheus-operator/prometheus-operator#8217) - \[BUGFIX] Validate Webex global configuration in the `Alertmanager` CRD. [#7979](prometheus-operator/prometheus-operator#7979) - \[BUGFIX] Validate Telegram global configuration in the `Alertmanager` CRD. [#8268](prometheus-operator/prometheus-operator#8268) - \[BUGFIX] Restore statefulset's labels if the creation fails with AlreadyExists. [#8343](prometheus-operator/prometheus-operator#8343) - \[BUGFIX] Fix potential panic due to informer cache races. [#8310](prometheus-operator/prometheus-operator#8310) - \[BUGFIX] Support probers defined with IPv6 addresses in the `Probe` CRD. [#8354](prometheus-operator/prometheus-operator#8354) - \[BUGFIX] Prevent group and repeat intervals with zero duration from breaking Alertmanager. [#8126](prometheus-operator/prometheus-operator#8126) - \[BUGFIX] Propagate all supported RocketChat attributes for `AlertmanagerConfig` CRD. [#8016](prometheus-operator/prometheus-operator#8016) - \[BUGFIX] Add URL validation for WeChat receiver. [#8256](prometheus-operator/prometheus-operator#8256) - \[BUGFIX] Add URL validation for SNS receiver. [#8259](prometheus-operator/prometheus-operator#8259) - \[BUGFIX] Fix GCE service discovery for the `ScrapeConfig` CRD. [#8284](prometheus-operator/prometheus-operator#8284) - \[BUGFIX] Avoid stale conditions in `Alertmanager`, `ThanosRuler`, `Prometheus` and `PrometheusAgent` resources. [#8304](prometheus-operator/prometheus-operator#8304) - \[BUGFIX] Fix race condition when updating rule ConfigMaps. [#8290](prometheus-operator/prometheus-operator#8290) - \[BUGFIX] Fix race condition when patching finalizers. [#8323](prometheus-operator/prometheus-operator#8323) - \[BUGFIX] Reconcile `ScrapeConfig` resources when namespace selection changes. [#8334](prometheus-operator/prometheus-operator#8334) --- ##### [\`v0.88.1\`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.88.1) - \[BUGFIX] Validate `webhookURL` secret for `MSTeams` receiver in `AlertmanagerConfig` CRD. [#8294](prometheus-operator/prometheus-operator#8294) - \[BUGFIX] Revert maximum version check for `EC2/Lightsail` SD in `ScrapeConfig` CRD. [#8308](prometheus-operator/prometheus-operator#8308) - \[BUGFIX] Relax URL validation in `Slack` receiver in AlertmanagerConfig CRD to support Go templates. [#8299](prometheus-operator/prometheus-operator#8299) [#8331](prometheus-operator/prometheus-operator#8331) - \[BUGFIX] Relax URL validation in `PagerDuty` in AlertmanagerConfig CRD to support Go templates. [#8319](prometheus-operator/prometheus-operator#8319) - \[BUGFIX] Relax URL validation in `WebhookConfig` in AlertmanagerConfig CRD to support Go templates. [#8307](prometheus-operator/prometheus-operator#8307) [#8317](prometheus-operator/prometheus-operator#8317) - \[BUGFIX] Relax URL validation in `RocketChat` receiver in AlertmanagerConfig CRD to support Go templates. [#8318](prometheus-operator/prometheus-operator#8318) - \[BUGFIX] Relax URL validation in `Pushover` receiver in AlertmanagerConfig CRD to support Go templates. [#8307](prometheus-operator/prometheus-operator#8307) [#8316](prometheus-operator/prometheus-operator#8316)
…r to v0.89.0 (#3775) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [prometheus-operator/prometheus-operator](https://github.com/prometheus-operator/prometheus-operator) | minor | `v0.88.1` → `v0.89.0` | --- ### Release Notes <details> <summary>prometheus-operator/prometheus-operator (prometheus-operator/prometheus-operator)</summary> ### [`v0.89.0`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.89.0): 0.89.0 / 2026-02-05 [Compare Source](prometheus-operator/prometheus-operator@v0.88.1...v0.89.0) - \[ENHANCEMENT] Add `hostNetwork` field to the `Alertmanager` CRD. [#​8281](prometheus-operator/prometheus-operator#8281) - \[ENHANCEMENT] Add the `crds` and `full-crds` commands to the operator's binary. [#​8251](prometheus-operator/prometheus-operator#8251) - \[ENHANCEMENT] Report deprecated field usage in the `Reconciled` condition type. [#​8236](prometheus-operator/prometheus-operator#8236) - \[ENHANCEMENT] Avoid unnecessary reconciliation upon creation of the `ThanosRuler` StatefulSet. [#​8347](prometheus-operator/prometheus-operator#8347) - \[ENHANCEMENT] Add `bodySizeLimit` to the ScrapeConfig CRD. [#​8348](prometheus-operator/prometheus-operator#8348) - \[ENHANCEMENT] Support `http_headers` field in the Alertmanager Secret. [#​8357](prometheus-operator/prometheus-operator#8357) - \[ENHANCEMENT] Add the `-kubelet-http-metrics` flag to enable/disable the HTTP metrics port in the Kubelet endpoint (default=enabled). [#​8350](prometheus-operator/prometheus-operator#8350) - \[ENHANCEMENT] Include `operator.prometheus.io/version` annotation in the full version of CRDs. [#​8279](prometheus-operator/prometheus-operator#8279) - \[BUGFIX] Validate VictorOps global configuration in the `Alertmanager` CRD. [#​8020](prometheus-operator/prometheus-operator#8020) - \[BUGFIX] Validate Jira global configuration in the `Alertmanager` CRD. [#​8265](prometheus-operator/prometheus-operator#8265) - \[BUGFIX] Validate VictorOps receiver's URL in the `AlertmanagerConfig` CRD. [#​8258](prometheus-operator/prometheus-operator#8258) - \[BUGFIX] Validate Webex receiver's URL in the `AlertmanagerConfig` CRD. [#​8255](prometheus-operator/prometheus-operator#8255) - \[BUGFIX] Validate Jira receiver's URL configuration in the `AlertmanagerConfig` CRD. [#​8230](prometheus-operator/prometheus-operator#8230) - \[BUGFIX] Validate OpsGenie receiver configuration in the `AlertmanagerConfig` CRD. [#​8267](prometheus-operator/prometheus-operator#8267) - \[BUGFIX] Validate WeChat receiver configuration in the `AlertmanagerConfig` CRD. [#​8271](prometheus-operator/prometheus-operator#8271) - \[BUGFIX] Validate SNS receiver configuration in the `AlertmanagerConfig` CRD. [#​8217](prometheus-operator/prometheus-operator#8217) - \[BUGFIX] Validate Webex global configuration in the `Alertmanager` CRD. [#​7979](prometheus-operator/prometheus-operator#7979) - \[BUGFIX] Validate Telegram global configuration in the `Alertmanager` CRD. [#​8268](prometheus-operator/prometheus-operator#8268) - \[BUGFIX] Restore statefulset's labels if the creation fails with AlreadyExists. [#​8343](prometheus-operator/prometheus-operator#8343) - \[BUGFIX] Fix potential panic due to informer cache races. [#​8310](prometheus-operator/prometheus-operator#8310) - \[BUGFIX] Support probers defined with IPv6 addresses in the `Probe` CRD. [#​8354](prometheus-operator/prometheus-operator#8354) - \[BUGFIX] Prevent group and repeat intervals with zero duration from breaking Alertmanager. [#​8126](prometheus-operator/prometheus-operator#8126) - \[BUGFIX] Propagate all supported RocketChat attributes for `AlertmanagerConfig` CRD. [#​8016](prometheus-operator/prometheus-operator#8016) - \[BUGFIX] Add URL validation for WeChat receiver. [#​8256](prometheus-operator/prometheus-operator#8256) - \[BUGFIX] Add URL validation for SNS receiver. [#​8259](prometheus-operator/prometheus-operator#8259) - \[BUGFIX] Fix GCE service discovery for the `ScrapeConfig` CRD. [#​8284](prometheus-operator/prometheus-operator#8284) - \[BUGFIX] Avoid stale conditions in `Alertmanager`, `ThanosRuler`, `Prometheus` and `PrometheusAgent` resources. [#​8304](prometheus-operator/prometheus-operator#8304) - \[BUGFIX] Fix race condition when updating rule ConfigMaps. [#​8290](prometheus-operator/prometheus-operator#8290) - \[BUGFIX] Fix race condition when patching finalizers. [#​8323](prometheus-operator/prometheus-operator#8323) - \[BUGFIX] Reconcile `ScrapeConfig` resources when namespace selection changes. [#​8334](prometheus-operator/prometheus-operator#8334) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4zLjYiLCJ1cGRhdGVkSW5WZXIiOiI0My4zLjYiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbImltYWdlIl19--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/3775 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
…er-nil-pointer-panic Fix nil pointer panic in StatefulSet status reporting during informer cache races
…r to v0.89.0 ##### [\`v0.89.0\`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.89.0) - \[ENHANCEMENT] Add `hostNetwork` field to the `Alertmanager` CRD. [#8281](prometheus-operator/prometheus-operator#8281) - \[ENHANCEMENT] Add the `crds` and `full-crds` commands to the operator's binary. [#8251](prometheus-operator/prometheus-operator#8251) - \[ENHANCEMENT] Report deprecated field usage in the `Reconciled` condition type. [#8236](prometheus-operator/prometheus-operator#8236) - \[ENHANCEMENT] Avoid unnecessary reconciliation upon creation of the `ThanosRuler` StatefulSet. [#8347](prometheus-operator/prometheus-operator#8347) - \[ENHANCEMENT] Add `bodySizeLimit` to the ScrapeConfig CRD. [#8348](prometheus-operator/prometheus-operator#8348) - \[ENHANCEMENT] Support `http_headers` field in the Alertmanager Secret. [#8357](prometheus-operator/prometheus-operator#8357) - \[ENHANCEMENT] Add the `-kubelet-http-metrics` flag to enable/disable the HTTP metrics port in the Kubelet endpoint (default=enabled). [#8350](prometheus-operator/prometheus-operator#8350) - \[ENHANCEMENT] Include `operator.prometheus.io/version` annotation in the full version of CRDs. [#8279](prometheus-operator/prometheus-operator#8279) - \[BUGFIX] Validate VictorOps global configuration in the `Alertmanager` CRD. [#8020](prometheus-operator/prometheus-operator#8020) - \[BUGFIX] Validate Jira global configuration in the `Alertmanager` CRD. [#8265](prometheus-operator/prometheus-operator#8265) - \[BUGFIX] Validate VictorOps receiver's URL in the `AlertmanagerConfig` CRD. [#8258](prometheus-operator/prometheus-operator#8258) - \[BUGFIX] Validate Webex receiver's URL in the `AlertmanagerConfig` CRD. [#8255](prometheus-operator/prometheus-operator#8255) - \[BUGFIX] Validate Jira receiver's URL configuration in the `AlertmanagerConfig` CRD. [#8230](prometheus-operator/prometheus-operator#8230) - \[BUGFIX] Validate OpsGenie receiver configuration in the `AlertmanagerConfig` CRD. [#8267](prometheus-operator/prometheus-operator#8267) - \[BUGFIX] Validate WeChat receiver configuration in the `AlertmanagerConfig` CRD. [#8271](prometheus-operator/prometheus-operator#8271) - \[BUGFIX] Validate SNS receiver configuration in the `AlertmanagerConfig` CRD. [#8217](prometheus-operator/prometheus-operator#8217) - \[BUGFIX] Validate Webex global configuration in the `Alertmanager` CRD. [#7979](prometheus-operator/prometheus-operator#7979) - \[BUGFIX] Validate Telegram global configuration in the `Alertmanager` CRD. [#8268](prometheus-operator/prometheus-operator#8268) - \[BUGFIX] Restore statefulset's labels if the creation fails with AlreadyExists. [#8343](prometheus-operator/prometheus-operator#8343) - \[BUGFIX] Fix potential panic due to informer cache races. [#8310](prometheus-operator/prometheus-operator#8310) - \[BUGFIX] Support probers defined with IPv6 addresses in the `Probe` CRD. [#8354](prometheus-operator/prometheus-operator#8354) - \[BUGFIX] Prevent group and repeat intervals with zero duration from breaking Alertmanager. [#8126](prometheus-operator/prometheus-operator#8126) - \[BUGFIX] Propagate all supported RocketChat attributes for `AlertmanagerConfig` CRD. [#8016](prometheus-operator/prometheus-operator#8016) - \[BUGFIX] Add URL validation for WeChat receiver. [#8256](prometheus-operator/prometheus-operator#8256) - \[BUGFIX] Add URL validation for SNS receiver. [#8259](prometheus-operator/prometheus-operator#8259) - \[BUGFIX] Fix GCE service discovery for the `ScrapeConfig` CRD. [#8284](prometheus-operator/prometheus-operator#8284) - \[BUGFIX] Avoid stale conditions in `Alertmanager`, `ThanosRuler`, `Prometheus` and `PrometheusAgent` resources. [#8304](prometheus-operator/prometheus-operator#8304) - \[BUGFIX] Fix race condition when updating rule ConfigMaps. [#8290](prometheus-operator/prometheus-operator#8290) - \[BUGFIX] Fix race condition when patching finalizers. [#8323](prometheus-operator/prometheus-operator#8323) - \[BUGFIX] Reconcile `ScrapeConfig` resources when namespace selection changes. [#8334](prometheus-operator/prometheus-operator#8334) --- ##### [\`v0.88.1\`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.88.1) - \[BUGFIX] Validate `webhookURL` secret for `MSTeams` receiver in `AlertmanagerConfig` CRD. [#8294](prometheus-operator/prometheus-operator#8294) - \[BUGFIX] Revert maximum version check for `EC2/Lightsail` SD in `ScrapeConfig` CRD. [#8308](prometheus-operator/prometheus-operator#8308) - \[BUGFIX] Relax URL validation in `Slack` receiver in AlertmanagerConfig CRD to support Go templates. [#8299](prometheus-operator/prometheus-operator#8299) [#8331](prometheus-operator/prometheus-operator#8331) - \[BUGFIX] Relax URL validation in `PagerDuty` in AlertmanagerConfig CRD to support Go templates. [#8319](prometheus-operator/prometheus-operator#8319) - \[BUGFIX] Relax URL validation in `WebhookConfig` in AlertmanagerConfig CRD to support Go templates. [#8307](prometheus-operator/prometheus-operator#8307) [#8317](prometheus-operator/prometheus-operator#8317) - \[BUGFIX] Relax URL validation in `RocketChat` receiver in AlertmanagerConfig CRD to support Go templates. [#8318](prometheus-operator/prometheus-operator#8318) - \[BUGFIX] Relax URL validation in `Pushover` receiver in AlertmanagerConfig CRD to support Go templates. [#8307](prometheus-operator/prometheus-operator#8307) [#8316](prometheus-operator/prometheus-operator#8316)
Summary
This PR fixes a nil pointer dereference panic in the Prometheus Operator that can occur during informer cache inconsistencies, especially during operator startup, restarts, or namespace deletion.
The root cause was an invalid contract in
informers.ForResource.Get()which could return(nil, nil). Callers assumed a non-nil object whenerr == nil, leading to a panic when dereferencing the returned object.This PR fixes the root cause and adds defensive checks to prevent similar crashes in the future.
Problem
While reporting Prometheus status, the operator retrieves StatefulSets from the informer cache. Under real Kubernetes race conditions (startup before cache sync, informer reinitialization, namespace deletion), the informer
Get()method may return(nil, nil).This leads to a nil pointer dereference when the returned object is accessed, crashing the operator.
Root Cause
informers.ForResource.Get()could return(nil, nil)when:err == nilimplies a valid objectFix
Root Cause Fix
ForResource.Get()now returns a properNotFounderror when no informers are available or no object is found.Defense in Depth
nilchecks when retrieving StatefulSets in:NotFoundinstead of causing a panic.How to Reproduce
Install Prometheus Operator with namespace restrictions:
helm install prometheus-operator prometheus-community/kube-prometheus-stack \ --set prometheusOperator.namespaces.allowList="{monitoring,app-team-a}"Create a Prometheus resource:
During operator startup (before informer cache sync completes), delete the namespace:
kubectl delete namespace monitoring &Observe operator behavior:
Impact
Before
After
Why This Is Safe
i have also added some test cases :
Expected output:
=== RUN TestInformers
=== RUN TestInformers/TestGet
=== RUN TestInformers/TestGetWithEmptyInformers
=== RUN TestInformers/TestGetNeverReturnsNilNil
=== RUN TestInformers/TestGetNeverReturnsNilNil/empty_informers
=== RUN TestInformers/TestGetNeverReturnsNilNil/object_not_found_in_single_informer
=== RUN TestInformers/TestGetNeverReturnsNilNil/object_not_found_in_multiple_informers
=== RUN TestInformers/TestGetReturnsObjectWhenFound
=== RUN TestInformers/TestGetSearchesAllInformers
--- PASS: TestInformers (0.00s)
--- PASS: TestInformers/TestGet (0.00s)
--- PASS: TestInformers/TestGetWithEmptyInformers (0.00s)
--- PASS: TestInformers/TestGetNeverReturnsNilNil (0.00s)
--- PASS: TestInformers/TestGetReturnsObjectWhenFound (0.00s)
--- PASS: TestInformers/TestGetSearchesAllInformers (0.00s)
PASS