Skip to content

podManagementPolicy: Parallel forces Alertmanagers to have PVCs and Prometheus pairs to lose HA #2753

@jutley

Description

@jutley

Recently, the operator was updated so that both Prometheuses and Alertmanagers result in StatefulSets with podManagementPolicy: Parallel. Based on what I'm reading in #2676, this was added due to a bug in StatefulSets that will cause pods to never be updated. However, this change creates a couple issues:

  1. Alertmanagers now require PVCs for a proper HA setup. We may be a unique case, but for us, this is a pretty frustrating change. PVCs can cause scheduling issues in our clusters, so we prefer to avoid using them when possible. podManagementPolicy: OrderedReady worked great for us, and allowed us to avoid using PVCs reliably.

  2. Prometheus pairs now get updated at the same time, which is in direct violation of the concept of an HA pair. Prometheus upgrades will consistently result in a complete monitoring outage for (hopefully) a small time window. In the case of a bad upgrade, this downtime would be extended with nothing except the person performing the update to detect it. When using a project like Cortex or Thanos, we should be able to perform a Prometheus update and not see any gap in the data, but this is no longer true with newer versions of the operator.

For these reasons, I'd like to see the podManagementPolicy be configurable for both Alertmanager and Prometheus. I acknowledge that in general we'd prefer to limit the fields in the operator, and I would argue that this is still important. Running completely blind for any length of time is not acceptable, especially when it is a part of the routine upgrade process.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions