Moved to #478
Today, Deployments/ReplicaSets use MinReadySeconds in order to include an additional delay on top of readiness checks and facilitate more robust rollouts. The ReplicaSet controller decides how many available Pods a ReplicaSet runs and the Deployment controller, when rolling out new Pods, will not proceed if the minimum available Pods that are required to run are not ready for at least MinReadySeconds (if MinReadySeconds is not specified then the Pods are considered available as soon as they are ready).
Two problems that have been identified so far with the current state of things:
- A Pod is marked ready by the kubelet as soon as it passes its readiness check. The ReplicaSet controller runs as part of master and estimates when a Pod is available by comparing the time the Pod became ready (as seen by the kubelet) with MinReadySeconds. Clock-skew between master and nodes will affect the availability checks.
- PodDisruptionBudget is working with ready Pods and has no notion of MinReadySeconds when used by a Deployment/ReplicaSet.
Both problems above can be solved by moving MinReadySeconds in the PodSpec. Once kubelet observes that a Pod has been ready for at least MinReadySeconds without any of its containers crashing, it will update the PodStatus with an Available condition set to Status=True. Higher-level orchestrators running on different machines such as the ReplicaSet or the PodDisruptionBudget controller will merely need to look at the Available condition that is set in the status of a Pod.
API changes
A new field is proposed in the PodSpec:
// Minimum number of seconds for which a newly created pod should be ready
// without any of its container crashing, for it to be considered available.
// Defaults to 0 (pod will be considered available as soon as it is ready)
// +optional
MinReadySeconds *int32 `json:"minReadySeconds,omitempty"`
and a new PodConditionType:
// PodAvailable is added in a ready pod that has MinReadySeconds specified. The pod
// should already be added under a load balancer and serve requests, this condition
// lets higher-level orchestrators know that the pod is running after MinReadySeconds
// without having any of its containers crashed.
PodAvailable PodConditionType = "Available"
Additionally:
- Deployments/ReplicaSets/DaemonSets already use MinReadySeconds in their spec so we should probably deprecate those fields in favor of the field in the PodSpec and remove them in a future version.
- Deployments/ReplicaSets will not propagate MinReadySeconds from their Spec down to the pod template because that will lead in differences in the pod templates between a Deployment and a ReplicaSet resulting in new rollouts. If MinReadySeconds is specified both in the spec and pod template for a Deployment/ReplicaSet and it's not the same value, a validation error will be returned (tentative). API defaulting can set MinReadySeconds in the spec, if it's specified only in the PodTemplate (tentative).
- ReplicaSets that specify MinReadySeconds only in the ReplicaSetSpec, can create new Pods by specifying MinReadySeconds in their PodSpec (w/o updating the ReplicaSet pod template).
kubelet changes
For a Pod tha specifies MinReadySeconds, kubelet will need to check (after MinReadySeconds) if any of the Pod containers has crashed. If not, it will switch the Available condition to Status=True in the status of the Pod. Pods that don't specify MinReadySeconds, won't have the Available condition set in their status.
Controller manager changes
The ReplicaSet controller will create new Pods by setting MinReadySeconds in their PodSpec if it's specified in the ReplicaSetSpec (and not in the ReplicaSet pod template). For Pods that don't specify MinReadySeconds, it can switch to use a virtual clock and continue using the current approach of estimating availability. This will also help in keeping new servers backwards-compatible with old kubelets.
Future work
The PDB controller will need to be extended to recognize the Available condition in Pods. It may also need to get into the bussiness of estimating availability for Deployments/ReplicaSets that already use MinReadySeconds (already existing Pods are not going to be updated with an Available condition - see the section above about kubelet changes).
@kubernetes/sig-apps-misc @kubernetes/sig-api-machinery-misc
Moved to #478
Today, Deployments/ReplicaSets use MinReadySeconds in order to include an additional delay on top of readiness checks and facilitate more robust rollouts. The ReplicaSet controller decides how many available Pods a ReplicaSet runs and the Deployment controller, when rolling out new Pods, will not proceed if the minimum available Pods that are required to run are not ready for at least MinReadySeconds (if MinReadySeconds is not specified then the Pods are considered available as soon as they are ready).
Two problems that have been identified so far with the current state of things:
Both problems above can be solved by moving MinReadySeconds in the PodSpec. Once kubelet observes that a Pod has been ready for at least MinReadySeconds without any of its containers crashing, it will update the PodStatus with an Available condition set to Status=True. Higher-level orchestrators running on different machines such as the ReplicaSet or the PodDisruptionBudget controller will merely need to look at the Available condition that is set in the status of a Pod.
API changes
A new field is proposed in the PodSpec:
and a new PodConditionType:
Additionally:
kubelet changes
For a Pod tha specifies MinReadySeconds, kubelet will need to check (after MinReadySeconds) if any of the Pod containers has crashed. If not, it will switch the Available condition to Status=True in the status of the Pod. Pods that don't specify MinReadySeconds, won't have the Available condition set in their status.
Controller manager changes
The ReplicaSet controller will create new Pods by setting MinReadySeconds in their PodSpec if it's specified in the ReplicaSetSpec (and not in the ReplicaSet pod template). For Pods that don't specify MinReadySeconds, it can switch to use a virtual clock and continue using the current approach of estimating availability. This will also help in keeping new servers backwards-compatible with old kubelets.
Future work
The PDB controller will need to be extended to recognize the Available condition in Pods. It may also need to get into the bussiness of estimating availability for Deployments/ReplicaSets that already use MinReadySeconds (already existing Pods are not going to be updated with an Available condition - see the section above about kubelet changes).
@kubernetes/sig-apps-misc @kubernetes/sig-api-machinery-misc