Tolerate temporary errors from etcdserver#11401
Conversation
c2bbcb0 to
3ca36c0
Compare
3ca36c0 to
0635219
Compare
dc81bb9 to
86defa7
Compare
There are cases when the etcdserver is temporarily unavailable and the errors that we get back from kube-apiserver reflect that error. It looks like we bail out immediately when these errors happen currently. We should retry until timeout is reached when this sort of errors happen. Signed-off-by: Davanum Srinivas <davanum@gmail.com>
86defa7 to
ebc79fa
Compare
|
@hickeyma this is ready now! |
|
@dims thanks for looking into the issues here. I the Kubernetes API supposed to be a leaky abstraction? Are clients expected to work with etcd? I'm asking about intent and mid-term intent. I'm wondering whether this code is something we will need to maintain long term or if this is a short term situation. |
|
@mattfarina we'll need a KEP in upstream, i've requested some folks who were pushing for this earlier to do more in 1.27 cycle (not 1.26), So until that KEP is discussed/reviewed/approved we will need this. we'll also need this until versions of kubernetes supported by helm has the old style leaky abstraction. |
technosophos
left a comment
There was a problem hiding this comment.
This seems to be the appropriate stop-gap for this error.
|
thanks @hickeyma ! |
|
Hey @technosophos @hickeyma. Unfortunately, this fix does not solve the issue. Can you take a look at my new fix? #11426 |
|
@dims Can this be backported to 3.2 please? |
|
@sruthiwander not this one! it was reverted, you will need #11426 Also https://github.com/helm/helm/releases/tag/v3.2.0 is practically ancient, i don't know/think that helm maintainers will go back that far https://helm.sh/docs/topics/release_policy/ |
What this PR does / why we need it:
There are cases when the etcdserver is temporarily unavailable and the
errors that we get back from kube-apiserver reflect that error. It looks
like we bail out immediately when these errors happen currently. We
should retry until timeout is reached when this sort of errors happen.
Fixes #9502
Fixes #7637
Signed-off-by: Davanum Srinivas davanum@gmail.com
Special notes for your reviewer:
With this patch, temporary errors like the etcdserver leader changes are not treated as terminal. We continue to retry until the specified timeout.
Note that there are things that can be done on the k8s side, discussion is going on there as well:
kubernetes/kubernetes#112152
If applicable:
isServiceUnavailable