local-up-cluster kube-proxy terminated error#82413
local-up-cluster kube-proxy terminated error#82413k8s-ci-robot merged 1 commit intokubernetes:masterfrom
Conversation
|
@zhlhahaha: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Hi @zhlhahaha. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/assign @vishh |
|
/assign @dims |
|
/ok-to-test |
|
@lubinszARM: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
|
/retest |
1 similar comment
|
/retest |
6d86654 to
b9ea7fe
Compare
|
/test pull-kubernetes-local-e2e |
|
/retest |
b9ea7fe to
95e6703
Compare
|
/test pull-kubernetes-local-e2e |
|
thank you for the PR. I'm facing the same issue. Previously this error didn't occur, so there is something other problem, but this PR seems useful as workaround :) |
|
/retest |
|
@MikeSpreitzer Hi, I get your idea. |
|
@zhlhahaha : what exactly do you mean by "It needs to set sleep time for kubelet service start, otherwise kubeproxy may fail to start"? If the only change is to swap the order then there is no sleep. I wonder whether by "may" you mean there is a general worry, or an observed problem. Did you test a change that only swaps the order, does not introduce a wait, and observe that the kube-proxy still erred due to lack of finding its Node? Regarding debugging the e2e failure, you can find leads in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-testing/testing.md |
Hi Mike, I do understand your idea, and I understand #81880 I do not think the two solutions conflict with each other.
Thanks for your suggestion on e2e test failure, it can take some time to learn how to do it. |
|
Yes, I understand the virtue of adding a wait in the local-up-cluster script. But I am concerned by the consistent failures of pull-kubernetes-local-e2e. Could the added wait somehow be causing the failures of pull-kubernetes-local-e2e? Since kube-proxy has a wait inside itself (note that #81880 only increases the duratinon of an existing wait), it is not obvious to me that adding an additional wait is necessary. I understand why waiting and reporting on the outcome in the local-up-cluster script is helpful to users. My question is, if we only swap the order and rely on the 5-try wait already in kube-proxy, is that sufficient to make local-up-cluster succeed? |
Hi Mike, |
|
/retest |
|
#83792 |
|
/uncc |
Thanks Ben, I have trapped here for a long~~ time. |
|
/kind cleanup |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, zhlhahaha The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
|
/retest Review the full test history for this PR. Silence the bot with an |
Recent changes in k8s , kubernetes/kubernetes#82413 checks for KUBELET_HOST in get nodes info, which is resulting in error. This commit is to update the same.
local-up-cluster kube-proxy terminated error
When using hack/local-up-cluster.sh deploy local cluster, it
failed with following message "kube-proxy terminated unexpectedly"
and "Failed to retrieve node info: nodes "127.0.0.1" not found" in
kube-proxy.log.
The root reason for this error is miss boot order of kubernetes
services in local-up-cluster.sh, kube-proxy and kubectl daemon.
When starting kube-proxy, it would check node information. And
these information are collected by kubelet daemon. However, in
the shell script, kube-proxy service start before kubelet daemon.
This patch changed the boot order of kubelet daemon and kube-proxy
and check if node stats ready for kube-proxy start.
What type of PR is this?
What this PR does / why we need it:
When using hack/local-up-cluster.sh deploy local cluster, it
failed with following message "kube-proxy terminated unexpectedly"
and "Failed to retrieve node info: nodes "127.0.0.1" not found" in
kube-proxy.log.
The root reason for this error is miss boot order of kubernetes
services in local-up-cluster.sh, kube-proxy and kubectl daemon.
When starting kube-proxy, it would check node information. And
these information are collected by kubelet daemon. However, in
the shell script, kube-proxy service start before kubelet daemon.
This patch changed the boot order of kubelet daemon and kube-proxy
and check if node stats ready for kube-proxy start.
Which issue(s) this PR fixes:
Fixes #81879
Special notes for your reviewer:
no
Does this PR introduce a user-facing change?:
no
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
no