Skip to content

pkg/alertmanager: change podManagementPolicy to parallel to prevent statefulset reconciliation from hanging#2676

Merged
paulfantom merged 1 commit intoprometheus-operator:masterfrom
paulfantom:am_podmanagement
Jul 22, 2019
Merged

pkg/alertmanager: change podManagementPolicy to parallel to prevent statefulset reconciliation from hanging#2676
paulfantom merged 1 commit intoprometheus-operator:masterfrom
paulfantom:am_podmanagement

Conversation

@paulfantom
Copy link
Member

When using default podManagementPolicy (OrderedReady) it is possible to create a
situation where alertmanager pod objects won't be reconciled with a
statefulset and thus preventing alertmanager from being deployed.

One of such cases is when alertmanager was deployed and afterwards admin
applied taints to all nodes causing pod eviction. Next tolerations were
applied however due to OrderedReady policy one alertmanager pod was still left in
Pending state preventing reconciliation.

While testing manually with podManagementPolicy: Parallel alertmanager pods were scheduled correctly.

We are already using podManagementPolicy: Parallel for prometheus statefulset (https://github.com/coreos/prometheus-operator/blob/master/pkg/prometheus/statefulset.go#L805), but this wasn't propagated to alertmanager for reasons unknown to me.

/cc @brancz @s-urbaniak

@paulfantom paulfantom changed the title pkg/alertmanager: change podManagement policy to parallel to prevent statefulset reconciliation from hanging pkg/alertmanager: change podManagementPolicy to parallel to prevent statefulset reconciliation from hanging Jul 17, 2019
@s-urbaniak
Copy link
Contributor

Very good catch!

We should probably, i.e. in the commit message, reference some context to make the rationale of this change more clear:

@s-urbaniak
Copy link
Contributor

I think this could even be added as source code comments in your PR as well as in https://github.com/coreos/prometheus-operator/blob/7c448d90c1bbf8740b5f524937209d4ec56958ca/pkg/prometheus/statefulset.go#L802

@s-urbaniak
Copy link
Contributor

could you also add the same comment in the prometheus statefulset?

@s-urbaniak
Copy link
Contributor

but generally very lgtm, thanks for catching!
/cc @metalmatze for another set of eyes.

…statefulset reconciliation from hanging

When using default podManagementPolicy it is possible to create a
situation where alertmanager pods objects won't be reconciled with a
statefulset and thus preventing am from being deployed.

One of such cases is when am was deployed and afterwards admin
applied taints to all nodes causing pod eviction. Next tolerations were
applied however due to OrderedReady policy one am pod was still left in
Pending state preventing reconciliation.

This is needed to provide a workaround for a bug in kubernetes detailed
in kubernetes/kubernetes#60164. It is also one
of the knows limitations of StetafulSets mentioned in docs
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#limitations
@paulfantom
Copy link
Member Author

Added comment to prometheus StatefulSet.

@metalmatze
Copy link
Member

I am not totally sure we want to have this in parallel.
There are a lot of people that run Alertmanager without PVCs and thus rely on the slower rolling update that is then gossiping the state from the old Alertmanagers to the newly started Alertmanager pod.

@brancz
Copy link
Contributor

brancz commented Jul 18, 2019

There are a lot of people that run Alertmanager without PVCs and thus rely on the slower rolling update that is then gossiping the state from the old Alertmanagers to the newly started Alertmanager pod.

This is correct. We intentionally did not introduce the parallel pod management policy here because we want the pods to be rolled out sequentially so that shutting down members in a cluster can make sure to sync the cluster with the state they have and there being enough time that a new pod can sync the state from the cluster before all other pods are rolled.

I think in general we can change our messaging around this and say "you must have a PV for alertmanager" and be done with it. 🤷‍♂️ What do you think?

@paulfantom
Copy link
Member Author

So we basically have two options how we can proceed and both are ok by me.

Parallel pod management policy

We would need to notify that you always need PV for alertmanager. This in turn can affect how we are creating cluster as 3rd alertmanager won't be needed anymore for HA setup.

OrderedReady pod management policy

We would need to notify users that in case of stuck alertmanager rollout they need to remove all am pods as this is just how StatefulSets are (not) working. Or maybe we could somehow add this logic into prometheus-operator? This way we could still have a setup where PVs are not needed.

In either way we need to acknowledge that this can happen because StatefulSets work this way (maybe in next release notes?).

Of course there is 3rd option: fix StatefulSets in k8s :trollface:

@brancz
Copy link
Contributor

brancz commented Jul 21, 2019

I say let's go for the parallel management policy. Feel free to merge whenever you're ready.

lgtm 👍

@paulfantom
Copy link
Member Author

OK, let's proceed with "Parallel" management policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants