What happened?
There is one scenario which can cause pod leak.
pclq has 3 replicas (pod1, pod2, pod3)
- T0: While pod1 is pending may due to resource limit or other issue, use kubectl delete pod manually. Pod1 is deleted successfully.
- T1: Informer cache updated (pod2, pod3)
- T2: PCLQ controller reconcile.
diff := len(sc.existingPCLQPods) + len(createExpectations) - int(sc.pclq.Spec.Replicas) - len(deleteExpectations) // diff = 2 + 1 - 3 - 0 = 0
In this case, pod1 will never be recreate. The reason here is that reconcile is slow in some condition and informer cache update before reconcile. SyncExpectations works on the assumption that some pods is in terminating or let's say pod delete is slow than Reconcile. We couldn't tell these two scenarios:
- informer can't see the created pod
- Pod is already deleted
What did you expect to happen?
Pod1 should be recreated.
Environment
- Kubernetes version
- Grove version
- Scheduler details
- Cloud provider or hardware configuration
- Tools that you are using Grove together with
- Anything else that is relevant
What happened?
There is one scenario which can cause pod leak.
pclq has 3 replicas (pod1, pod2, pod3)
diff := len(sc.existingPCLQPods) + len(createExpectations) - int(sc.pclq.Spec.Replicas) - len(deleteExpectations) // diff = 2 + 1 - 3 - 0 = 0In this case, pod1 will never be recreate. The reason here is that reconcile is slow in some condition and informer cache update before reconcile.
SyncExpectationsworks on the assumption that some pods is in terminating or let's say pod delete is slow than Reconcile. We couldn't tell these two scenarios:What did you expect to happen?
Pod1 should be recreated.
Environment