Skip to content

fix: validate nonzero duration in AlertmanagerConfig#8126

Merged
simonpasquier merged 20 commits intoprometheus-operator:mainfrom
kartikangiras:alertmanagerconfig
Feb 5, 2026
Merged

fix: validate nonzero duration in AlertmanagerConfig#8126
simonpasquier merged 20 commits intoprometheus-operator:mainfrom
kartikangiras:alertmanagerconfig

Conversation

@kartikangiras
Copy link
Contributor

Description

added a nonzero duration function for group_interval and repeat_interval in alertmanagerconfig for incorrect validation

Closes: #6594

Type of change

What type of changes does your code introduce to the Prometheus operator? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Verification

Please check the Prometheus-Operator testing guidelines for recommendations about automated tests.

Changelog entry

Please put a one-line changelog entry below. This will be copied to the changelog file during the release process.


@kartikangiras kartikangiras requested a review from a team as a code owner November 30, 2025 18:02
@slashpai
Copy link
Contributor

slashpai commented Dec 1, 2025

@slashpai
Copy link
Contributor

slashpai commented Dec 3, 2025

@ikartiksh Can you address #8126 (comment)?

@kartikangiras
Copy link
Contributor Author

kartikangiras commented Dec 3, 2025

Thank you for the PR

I think this would be better to handle at API level http://github.com/prometheus-operator/prometheus-operator/blob/a198a5ff2c4b99ec614d5f28463cc0b19d272430/pkg/apis/monitoring/v1alpha1/alertmanager_config_types.go#L110-L120

@slashpai i think it has been already addressed to handle at the api level.

@slashpai
Copy link
Contributor

slashpai commented Dec 3, 2025

You would need to add validation // +kubebuilder:validation:Pattern in

GroupWait string `json:"groupWait,omitempty"`
// groupInterval defines how long to wait before sending an updated notification.
// Must match the regular expression`^(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?$`
// Example: "5m"
// +optional
GroupInterval string `json:"groupInterval,omitempty"`
// repeatInterval defines how long to wait before repeating the last notification.
// Must match the regular expression`^(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?$`
// Example: "4h"
// +optional
RepeatInterval string `json:"repeatInterval,omitempty"`

ref: https://book.kubebuilder.io/reference/markers/crd-validation

@slashpai
Copy link
Contributor

slashpai commented Dec 3, 2025

You would need to run make --always-make format generate after updating API

Also lets create a new test in e2e test after

func testAlertmanagerConfigCRD(t *testing.T) {

You can create something similar to

func testPrometheusCRDValidation(t *testing.T) {

@kartikangiras
Copy link
Contributor Author

You would need to run make --always-make format generate after updating API

Also lets create a new test in e2e test after

func testAlertmanagerConfigCRD(t *testing.T) {

You can create something similar to

func testPrometheusCRDValidation(t *testing.T) {

sure, i am onto the tests for the same.

@kartikangiras
Copy link
Contributor Author

@slashpai i have added the e2e test for the code change.

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for misleading you but it looks like group and repeat intervals with zero value are always rejected (not only for the top-level route).
One option could be that the operator "hides" the problem by not setting the field in the generated configuration if it detects a zero value but I'm not sure that we want this.

@kartikangiras
Copy link
Contributor Author

Sorry for misleading you but it looks like group and repeat intervals with zero value are always rejected (not only for the top-level route). One option could be that the operator "hides" the problem by not setting the field in the generated configuration if it detects a zero value but I'm not sure that we want this.

@simonpasquier should i just then apply the same changes to v1beta1 along with the positive duration function and the the operator hides the problem by not setting the field.

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think that I've settled my mind :)
First think to note: zero durations are indeed forbidden for repeat/group intervals. So my plan would:

  • Change the GroupInterval and RepeatInterval fields to *monv1.Duration type. It will avoid maintaining another regexp only for duration > 0.
  • As a consequence when we generate the final configuration, *route.sanitize() should check the duration values and reset to the empty string if the input value is 0.

Bonus question: I suspect that the Alertmanager would also crash if we provision an 0 interval via the Kubernetes secret method?

@kartikangiras
Copy link
Contributor Author

thanks @simonpasquier for the approach to follow up and it is good to go.

for the bonus question, yes the alertmanager would crash when provisioned with 0 internal via the k8s secret due to the misconfiguration.

@kartikangiras
Copy link
Contributor Author

Ok I think that I've settled my mind :) First think to note: zero durations are indeed forbidden for repeat/group intervals. So my plan would:

  • Change the GroupInterval and RepeatInterval fields to *monv1.Duration type. It will avoid maintaining another regexp only for duration > 0.
  • As a consequence when we generate the final configuration, *route.sanitize() should check the duration values and reset to the empty string if the input value is 0.

Bonus question: I suspect that the Alertmanager would also crash if we provision an 0 interval via the Kubernetes secret method?

@simonpasquier while checking out I see that using *route.sanitize() may not be a feasible option for checking the final configuration, instead we may use CEL validation something like this // kubebuilder:validation:XValidation:rule="self == '' || duration(self) > duration('0s')",message="repeat_interval must be greater than 0"

@simonpasquier
Copy link
Contributor

while checking out I see that using *route.sanitize() may not be a feasible option for checking the final configuration, instead we may use CEL validation

I'm not sure to understand... Eventually route.sanitize should always do its best to verify that the assembled configuration is valid.

@kartikangiras
Copy link
Contributor Author

@simonpasquier made all the required changes.
thanks.

@simonpasquier
Copy link
Contributor

Conclustion from our discussion after today's office hours:

  • We want the 3 fields to be of type *monitoringv1.NonEmptyDuration which will ensure that the values are valid durations for Alertmanager at the API level.
  • When generating the final Alertmanager configuration, the operator should reset GroupInterval and RepeatInterval values to the empty string if the value parses to a zero duration. It covers the 2 options to provision the configuration: AlertmanagerConfig custom resources and Kubernetes Secret.
  • We continue validating that the values are valid durations in pkg/alertmanager/validation/v1alpha1/validation.go and pkg/alertmanager/validation/v1beta1/validation.go to account for the fact that users might update their operator without upgrading the CRDs immediatly. This extra validation can be removed starting with v0.91.

@kartikangiras
Copy link
Contributor Author

@simonpasquier I think everything mentioned is already covered up and implemented in the code.

Zero durations are converted to empty strings which makes use of the
default value defined at a higher level in the routing tree.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
@simonpasquier simonpasquier enabled auto-merge (squash) February 5, 2026 10:08
@simonpasquier simonpasquier merged commit 69b3653 into prometheus-operator:main Feb 5, 2026
22 checks passed
@kartikangiras kartikangiras deleted the alertmanagerconfig branch February 5, 2026 14:48
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Feb 6, 2026
…r to v0.89.0 (#3775)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [prometheus-operator/prometheus-operator](https://github.com/prometheus-operator/prometheus-operator) | minor | `v0.88.1` → `v0.89.0` |

---

### Release Notes

<details>
<summary>prometheus-operator/prometheus-operator (prometheus-operator/prometheus-operator)</summary>

### [`v0.89.0`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.89.0): 0.89.0 / 2026-02-05

[Compare Source](prometheus-operator/prometheus-operator@v0.88.1...v0.89.0)

- \[ENHANCEMENT] Add `hostNetwork` field to the `Alertmanager` CRD. [#&#8203;8281](prometheus-operator/prometheus-operator#8281)
- \[ENHANCEMENT] Add the `crds` and `full-crds` commands to the operator's binary. [#&#8203;8251](prometheus-operator/prometheus-operator#8251)
- \[ENHANCEMENT] Report deprecated field usage in the `Reconciled` condition type. [#&#8203;8236](prometheus-operator/prometheus-operator#8236)
- \[ENHANCEMENT] Avoid unnecessary reconciliation upon creation of the `ThanosRuler` StatefulSet. [#&#8203;8347](prometheus-operator/prometheus-operator#8347)
- \[ENHANCEMENT] Add `bodySizeLimit` to the ScrapeConfig CRD. [#&#8203;8348](prometheus-operator/prometheus-operator#8348)
- \[ENHANCEMENT] Support `http_headers` field in the Alertmanager Secret. [#&#8203;8357](prometheus-operator/prometheus-operator#8357)
- \[ENHANCEMENT] Add the `-kubelet-http-metrics` flag to enable/disable the HTTP metrics port in the Kubelet endpoint (default=enabled). [#&#8203;8350](prometheus-operator/prometheus-operator#8350)
- \[ENHANCEMENT] Include `operator.prometheus.io/version` annotation in the full version of CRDs. [#&#8203;8279](prometheus-operator/prometheus-operator#8279)
- \[BUGFIX] Validate VictorOps global configuration in the `Alertmanager` CRD. [#&#8203;8020](prometheus-operator/prometheus-operator#8020)
- \[BUGFIX] Validate Jira global configuration in the `Alertmanager` CRD. [#&#8203;8265](prometheus-operator/prometheus-operator#8265)
- \[BUGFIX] Validate VictorOps receiver's URL in the `AlertmanagerConfig` CRD. [#&#8203;8258](prometheus-operator/prometheus-operator#8258)
- \[BUGFIX] Validate Webex receiver's URL in the `AlertmanagerConfig` CRD. [#&#8203;8255](prometheus-operator/prometheus-operator#8255)
- \[BUGFIX] Validate Jira receiver's URL configuration in the `AlertmanagerConfig` CRD. [#&#8203;8230](prometheus-operator/prometheus-operator#8230)
- \[BUGFIX] Validate OpsGenie receiver configuration in the `AlertmanagerConfig` CRD. [#&#8203;8267](prometheus-operator/prometheus-operator#8267)
- \[BUGFIX] Validate WeChat receiver configuration in the `AlertmanagerConfig` CRD. [#&#8203;8271](prometheus-operator/prometheus-operator#8271)
- \[BUGFIX] Validate SNS receiver configuration in the `AlertmanagerConfig` CRD. [#&#8203;8217](prometheus-operator/prometheus-operator#8217)
- \[BUGFIX] Validate Webex global configuration in the `Alertmanager` CRD. [#&#8203;7979](prometheus-operator/prometheus-operator#7979)
- \[BUGFIX] Validate Telegram global configuration in the `Alertmanager` CRD. [#&#8203;8268](prometheus-operator/prometheus-operator#8268)
- \[BUGFIX] Restore statefulset's labels if the creation fails with AlreadyExists. [#&#8203;8343](prometheus-operator/prometheus-operator#8343)
- \[BUGFIX] Fix potential panic due to informer cache races. [#&#8203;8310](prometheus-operator/prometheus-operator#8310)
- \[BUGFIX] Support probers defined with IPv6 addresses in the `Probe` CRD. [#&#8203;8354](prometheus-operator/prometheus-operator#8354)
- \[BUGFIX] Prevent group and repeat intervals with zero duration from breaking Alertmanager. [#&#8203;8126](prometheus-operator/prometheus-operator#8126)
- \[BUGFIX] Propagate all supported RocketChat attributes for `AlertmanagerConfig` CRD. [#&#8203;8016](prometheus-operator/prometheus-operator#8016)
- \[BUGFIX] Add URL validation for WeChat receiver. [#&#8203;8256](prometheus-operator/prometheus-operator#8256)
- \[BUGFIX] Add URL validation for SNS receiver. [#&#8203;8259](prometheus-operator/prometheus-operator#8259)
- \[BUGFIX] Fix GCE service discovery for the `ScrapeConfig` CRD. [#&#8203;8284](prometheus-operator/prometheus-operator#8284)
- \[BUGFIX] Avoid stale conditions in `Alertmanager`, `ThanosRuler`, `Prometheus` and `PrometheusAgent` resources. [#&#8203;8304](prometheus-operator/prometheus-operator#8304)
- \[BUGFIX] Fix race condition when updating rule ConfigMaps. [#&#8203;8290](prometheus-operator/prometheus-operator#8290)
- \[BUGFIX] Fix race condition when patching finalizers. [#&#8203;8323](prometheus-operator/prometheus-operator#8323)
- \[BUGFIX] Reconcile `ScrapeConfig` resources when namespace selection changes. [#&#8203;8334](prometheus-operator/prometheus-operator#8334)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4zLjYiLCJ1cGRhdGVkSW5WZXIiOiI0My4zLjYiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbImltYWdlIl19-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/3775
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
nutmos pushed a commit to nutmos/prometheus-operator that referenced this pull request Feb 14, 2026
…ator#8126)

* add valid alertmanagerconfig

* update validation check for global alertmanagerconfig object

* add validation marker

* add e2e test for amcfg

* fix depedency issues

* add sanitization logic & monitorv1duration field

* update tests with monitorv1duration field logic

* validate and update fields

* update import dependencies

* update unnecessary groupwait conversion

* update generated files and formatting

* update error handling && refine methods

* nonemptyduration validation for groupwait

* retrigger CI

* update validation for groupwait

* remove toplevel route validation for intervals

* simplify handling of zero durations

Zero durations are converted to empty strings which makes use of the
default value defined at a higher level in the routing tree.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix generated file

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix test compilation

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

---------

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Feb 21, 2026
…r to v0.89.0

##### [\`v0.89.0\`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.89.0)

- \[ENHANCEMENT] Add `hostNetwork` field to the `Alertmanager` CRD. [#8281](prometheus-operator/prometheus-operator#8281)
- \[ENHANCEMENT] Add the `crds` and `full-crds` commands to the operator's binary. [#8251](prometheus-operator/prometheus-operator#8251)
- \[ENHANCEMENT] Report deprecated field usage in the `Reconciled` condition type. [#8236](prometheus-operator/prometheus-operator#8236)
- \[ENHANCEMENT] Avoid unnecessary reconciliation upon creation of the `ThanosRuler` StatefulSet. [#8347](prometheus-operator/prometheus-operator#8347)
- \[ENHANCEMENT] Add `bodySizeLimit` to the ScrapeConfig CRD. [#8348](prometheus-operator/prometheus-operator#8348)
- \[ENHANCEMENT] Support `http_headers` field in the Alertmanager Secret. [#8357](prometheus-operator/prometheus-operator#8357)
- \[ENHANCEMENT] Add the `-kubelet-http-metrics` flag to enable/disable the HTTP metrics port in the Kubelet endpoint (default=enabled). [#8350](prometheus-operator/prometheus-operator#8350)
- \[ENHANCEMENT] Include `operator.prometheus.io/version` annotation in the full version of CRDs. [#8279](prometheus-operator/prometheus-operator#8279)
- \[BUGFIX] Validate VictorOps global configuration in the `Alertmanager` CRD. [#8020](prometheus-operator/prometheus-operator#8020)
- \[BUGFIX] Validate Jira global configuration in the `Alertmanager` CRD. [#8265](prometheus-operator/prometheus-operator#8265)
- \[BUGFIX] Validate VictorOps receiver's URL in the `AlertmanagerConfig` CRD. [#8258](prometheus-operator/prometheus-operator#8258)
- \[BUGFIX] Validate Webex receiver's URL in the `AlertmanagerConfig` CRD. [#8255](prometheus-operator/prometheus-operator#8255)
- \[BUGFIX] Validate Jira receiver's URL configuration in the `AlertmanagerConfig` CRD. [#8230](prometheus-operator/prometheus-operator#8230)
- \[BUGFIX] Validate OpsGenie receiver configuration in the `AlertmanagerConfig` CRD. [#8267](prometheus-operator/prometheus-operator#8267)
- \[BUGFIX] Validate WeChat receiver configuration in the `AlertmanagerConfig` CRD. [#8271](prometheus-operator/prometheus-operator#8271)
- \[BUGFIX] Validate SNS receiver configuration in the `AlertmanagerConfig` CRD. [#8217](prometheus-operator/prometheus-operator#8217)
- \[BUGFIX] Validate Webex global configuration in the `Alertmanager` CRD. [#7979](prometheus-operator/prometheus-operator#7979)
- \[BUGFIX] Validate Telegram global configuration in the `Alertmanager` CRD. [#8268](prometheus-operator/prometheus-operator#8268)
- \[BUGFIX] Restore statefulset's labels if the creation fails with AlreadyExists. [#8343](prometheus-operator/prometheus-operator#8343)
- \[BUGFIX] Fix potential panic due to informer cache races. [#8310](prometheus-operator/prometheus-operator#8310)
- \[BUGFIX] Support probers defined with IPv6 addresses in the `Probe` CRD. [#8354](prometheus-operator/prometheus-operator#8354)
- \[BUGFIX] Prevent group and repeat intervals with zero duration from breaking Alertmanager. [#8126](prometheus-operator/prometheus-operator#8126)
- \[BUGFIX] Propagate all supported RocketChat attributes for `AlertmanagerConfig` CRD. [#8016](prometheus-operator/prometheus-operator#8016)
- \[BUGFIX] Add URL validation for WeChat receiver. [#8256](prometheus-operator/prometheus-operator#8256)
- \[BUGFIX] Add URL validation for SNS receiver. [#8259](prometheus-operator/prometheus-operator#8259)
- \[BUGFIX] Fix GCE service discovery for the `ScrapeConfig` CRD. [#8284](prometheus-operator/prometheus-operator#8284)
- \[BUGFIX] Avoid stale conditions in `Alertmanager`, `ThanosRuler`, `Prometheus` and `PrometheusAgent` resources. [#8304](prometheus-operator/prometheus-operator#8304)
- \[BUGFIX] Fix race condition when updating rule ConfigMaps. [#8290](prometheus-operator/prometheus-operator#8290)
- \[BUGFIX] Fix race condition when patching finalizers. [#8323](prometheus-operator/prometheus-operator#8323)
- \[BUGFIX] Reconcile `ScrapeConfig` resources when namespace selection changes. [#8334](prometheus-operator/prometheus-operator#8334)

---
##### [\`v0.88.1\`](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.88.1)

- \[BUGFIX] Validate `webhookURL` secret for `MSTeams` receiver in `AlertmanagerConfig` CRD. [#8294](prometheus-operator/prometheus-operator#8294)
- \[BUGFIX] Revert maximum version check for `EC2/Lightsail` SD in `ScrapeConfig` CRD. [#8308](prometheus-operator/prometheus-operator#8308)
- \[BUGFIX] Relax URL validation in `Slack` receiver in AlertmanagerConfig CRD to support Go templates. [#8299](prometheus-operator/prometheus-operator#8299) [#8331](prometheus-operator/prometheus-operator#8331)
- \[BUGFIX] Relax URL validation in `PagerDuty` in AlertmanagerConfig CRD to support Go templates. [#8319](prometheus-operator/prometheus-operator#8319)
- \[BUGFIX] Relax URL validation in `WebhookConfig` in AlertmanagerConfig CRD to support Go templates. [#8307](prometheus-operator/prometheus-operator#8307) [#8317](prometheus-operator/prometheus-operator#8317)
- \[BUGFIX] Relax URL validation in `RocketChat` receiver in AlertmanagerConfig CRD to support Go templates. [#8318](prometheus-operator/prometheus-operator#8318)
- \[BUGFIX] Relax URL validation in `Pushover` receiver in AlertmanagerConfig CRD to support Go templates. [#8307](prometheus-operator/prometheus-operator#8307) [#8316](prometheus-operator/prometheus-operator#8316)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid AlertmanagerConfig passes validation and results in 'group_interval cannot be zero' errors

3 participants