What happened?
When a Node becomes NotReady, the Pods on it first transition to NotReady (triggering a reconcile where they aren't yet terminating), and later get deleted by the node controller (receiving a deletionTimestamp).
However, the PCLQ controller does not trigger a new reconcile when the deletionTimestamp is added. This causes the Pods to remain in a terminating state without being replaced, resulting in the actual replica count dropping below the expected count and affecting service availability.
What did you expect to happen?
When Pods on a NotReady node are marked for deletion (receive a deletionTimestamp), the PCLQ controller should detect this change and trigger a reconciliation. It should then promptly create new replacement Pods to maintain the desired replica count and ensure service availability.
Environment
- Kubernetes version
- Grove version
- Scheduler details
- Cloud provider or hardware configuration
- Tools that you are using Grove together with
- Anything else that is relevant
What happened?
When a Node becomes
NotReady, the Pods on it first transition toNotReady(triggering a reconcile where they aren't yet terminating), and later get deleted by the node controller (receiving adeletionTimestamp).However, the PCLQ controller does not trigger a new reconcile when the
deletionTimestampis added. This causes the Pods to remain in a terminating state without being replaced, resulting in the actual replica count dropping below the expected count and affecting service availability.What did you expect to happen?
When Pods on a
NotReadynode are marked for deletion (receive adeletionTimestamp), the PCLQ controller should detect this change and trigger a reconciliation. It should then promptly create new replacement Pods to maintain the desired replica count and ensure service availability.Environment