-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Recently, the operator was updated so that both Prometheuses and Alertmanagers result in StatefulSets with podManagementPolicy: Parallel. Based on what I'm reading in #2676, this was added due to a bug in StatefulSets that will cause pods to never be updated. However, this change creates a couple issues:
-
Alertmanagers now require PVCs for a proper HA setup. We may be a unique case, but for us, this is a pretty frustrating change. PVCs can cause scheduling issues in our clusters, so we prefer to avoid using them when possible.
podManagementPolicy: OrderedReadyworked great for us, and allowed us to avoid using PVCs reliably. -
Prometheus pairs now get updated at the same time, which is in direct violation of the concept of an HA pair. Prometheus upgrades will consistently result in a complete monitoring outage for (hopefully) a small time window. In the case of a bad upgrade, this downtime would be extended with nothing except the person performing the update to detect it. When using a project like Cortex or Thanos, we should be able to perform a Prometheus update and not see any gap in the data, but this is no longer true with newer versions of the operator.
For these reasons, I'd like to see the podManagementPolicy be configurable for both Alertmanager and Prometheus. I acknowledge that in general we'd prefer to limit the fields in the operator, and I would argue that this is still important. Running completely blind for any length of time is not acceptable, especially when it is a part of the routine upgrade process.