pkg/alertmanager: change podManagementPolicy to parallel to prevent statefulset reconciliation from hanging by paulfantom · Pull Request #2676 · prometheus-operator/prometheus-operator

paulfantom · 2019-07-17T10:23:40Z

When using default podManagementPolicy (OrderedReady) it is possible to create a
situation where alertmanager pod objects won't be reconciled with a
statefulset and thus preventing alertmanager from being deployed.

One of such cases is when alertmanager was deployed and afterwards admin
applied taints to all nodes causing pod eviction. Next tolerations were
applied however due to OrderedReady policy one alertmanager pod was still left in
Pending state preventing reconciliation.

While testing manually with podManagementPolicy: Parallel alertmanager pods were scheduled correctly.

We are already using podManagementPolicy: Parallel for prometheus statefulset (https://github.com/coreos/prometheus-operator/blob/master/pkg/prometheus/statefulset.go#L805), but this wasn't propagated to alertmanager for reasons unknown to me.

/cc @brancz @s-urbaniak

s-urbaniak · 2019-07-17T12:23:23Z

Very good catch!

We should probably, i.e. in the commit message, reference some context to make the rationale of this change more clear:

s-urbaniak · 2019-07-17T12:24:15Z

I think this could even be added as source code comments in your PR as well as in https://github.com/coreos/prometheus-operator/blob/7c448d90c1bbf8740b5f524937209d4ec56958ca/pkg/prometheus/statefulset.go#L802

s-urbaniak · 2019-07-17T12:42:44Z

could you also add the same comment in the prometheus statefulset?

s-urbaniak · 2019-07-17T12:47:45Z

but generally very lgtm, thanks for catching!
/cc @metalmatze for another set of eyes.

…statefulset reconciliation from hanging When using default podManagementPolicy it is possible to create a situation where alertmanager pods objects won't be reconciled with a statefulset and thus preventing am from being deployed. One of such cases is when am was deployed and afterwards admin applied taints to all nodes causing pod eviction. Next tolerations were applied however due to OrderedReady policy one am pod was still left in Pending state preventing reconciliation. This is needed to provide a workaround for a bug in kubernetes detailed in kubernetes/kubernetes#60164. It is also one of the knows limitations of StetafulSets mentioned in docs https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#limitations

paulfantom · 2019-07-17T12:58:20Z

Added comment to prometheus StatefulSet.

metalmatze · 2019-07-17T13:08:07Z

I am not totally sure we want to have this in parallel.
There are a lot of people that run Alertmanager without PVCs and thus rely on the slower rolling update that is then gossiping the state from the old Alertmanagers to the newly started Alertmanager pod.

brancz · 2019-07-18T02:41:29Z

There are a lot of people that run Alertmanager without PVCs and thus rely on the slower rolling update that is then gossiping the state from the old Alertmanagers to the newly started Alertmanager pod.

This is correct. We intentionally did not introduce the parallel pod management policy here because we want the pods to be rolled out sequentially so that shutting down members in a cluster can make sure to sync the cluster with the state they have and there being enough time that a new pod can sync the state from the cluster before all other pods are rolled.

I think in general we can change our messaging around this and say "you must have a PV for alertmanager" and be done with it. 🤷‍♂️ What do you think?

paulfantom · 2019-07-18T12:38:42Z

So we basically have two options how we can proceed and both are ok by me.

Parallel pod management policy

We would need to notify that you always need PV for alertmanager. This in turn can affect how we are creating cluster as 3rd alertmanager won't be needed anymore for HA setup.

OrderedReady pod management policy

We would need to notify users that in case of stuck alertmanager rollout they need to remove all am pods as this is just how StatefulSets are (not) working. Or maybe we could somehow add this logic into prometheus-operator? This way we could still have a setup where PVs are not needed.

In either way we need to acknowledge that this can happen because StatefulSets work this way (maybe in next release notes?).

Of course there is 3rd option: fix StatefulSets in k8s

brancz · 2019-07-21T02:45:38Z

I say let's go for the parallel management policy. Feel free to merge whenever you're ready.

lgtm 👍

paulfantom · 2019-07-22T13:04:47Z

OK, let's proceed with "Parallel" management policy.

paulfantom changed the title ~~pkg/alertmanager: change podManagement policy to parallel to prevent statefulset reconciliation from hanging~~ pkg/alertmanager: change podManagementPolicy to parallel to prevent statefulset reconciliation from hanging Jul 17, 2019

paulfantom force-pushed the am_podmanagement branch from 30301ad to 7e912ae Compare July 17, 2019 12:36

paulfantom force-pushed the am_podmanagement branch from 7e912ae to ac7626c Compare July 17, 2019 12:57

paulfantom merged commit 14d447f into prometheus-operator:master Jul 22, 2019

paulfantom deleted the am_podmanagement branch July 22, 2019 13:04

paulfantom mentioned this pull request Jul 24, 2019

Synchronize with upstream master branch openshift/prometheus-operator#35

Merged

jutley mentioned this pull request Sep 9, 2019

podManagementPolicy: Parallel forces Alertmanagers to have PVCs and Prometheus pairs to lose HA #2753

Closed

paulfantom mentioned this pull request Oct 14, 2019

Ensure prometheus and alert manager are using Parallel policy #2221

Closed

simonpasquier mentioned this pull request Dec 23, 2025

Implement custom update strategy for statefulset to avoid stuck rollouts #8205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/alertmanager: change podManagementPolicy to parallel to prevent statefulset reconciliation from hanging#2676

pkg/alertmanager: change podManagementPolicy to parallel to prevent statefulset reconciliation from hanging#2676
paulfantom merged 1 commit intoprometheus-operator:masterfrom
paulfantom:am_podmanagement

paulfantom commented Jul 17, 2019

Uh oh!

s-urbaniak commented Jul 17, 2019

Uh oh!

s-urbaniak commented Jul 17, 2019

Uh oh!

s-urbaniak commented Jul 17, 2019

Uh oh!

s-urbaniak commented Jul 17, 2019

Uh oh!

paulfantom commented Jul 17, 2019

Uh oh!

metalmatze commented Jul 17, 2019

Uh oh!

brancz commented Jul 18, 2019

Uh oh!

paulfantom commented Jul 18, 2019

Uh oh!

brancz commented Jul 21, 2019

Uh oh!

paulfantom commented Jul 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

paulfantom commented Jul 17, 2019

Uh oh!

s-urbaniak commented Jul 17, 2019

Uh oh!

s-urbaniak commented Jul 17, 2019

Uh oh!

s-urbaniak commented Jul 17, 2019

Uh oh!

s-urbaniak commented Jul 17, 2019

Uh oh!

paulfantom commented Jul 17, 2019

Uh oh!

metalmatze commented Jul 17, 2019

Uh oh!

brancz commented Jul 18, 2019

Uh oh!

paulfantom commented Jul 18, 2019

Parallel pod management policy

OrderedReady pod management policy

Uh oh!

brancz commented Jul 21, 2019

Uh oh!

paulfantom commented Jul 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants