WIP - Remove special-case handling of master label nodes in e2e#76654
WIP - Remove special-case handling of master label nodes in e2e#76654liggitt wants to merge 1 commit intokubernetes:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
cc @kubernetes/cncf-conformance-wg |
fd49cd5 to
f790c76
Compare
f790c76 to
a7119c2
Compare
| OutputPrintType string | ||
| // NodeSchedulableTimeout is the timeout for waiting for all nodes to be schedulable. | ||
| // NodeSchedulableSelector is the label selector of the nodes we should wait to be scheduleable. If empty, wait for all nodes. | ||
| NodeSchedulableSelector string |
|
This is consistent with our statement that a) role labels are advisory |
|
/retest |
|
/test pull-kubernetes-e2e-kops-aws |
|
/hold is there anything I need to do to plumb this option through kubetest or ginkgo.sh? |
|
/test pull-kubernetes-e2e-kind |
kubetest has this flag:
so it can already pass the new cc @justinsb |
Only until we adjust the kops job invocation to inform the test framework that it shouldn't wait for all nodes to be schedulable based on the way kops sets up the cluster |
|
the kind job SIGABRT-ed from ginkgo. the backtrace does not include framework entries, but my guess here is a missing net/http timeout. the kind(kubeadm) worker nodes are unlabeled, so if NodeSchedulableSelector is now mandatory it means we have to modify them with respect to the e2e testing. /test pull-kubernetes-e2e-kind |
|
/assign @timothysc |
It is not mandatory. Environments with no special node labeling don't need to do anything. The existing e2e presubmits passed with no changes, for example. Only clusters that assign special labels to nodes they don't want to wait for need to inform the e2e of that. |
|
they'd also been passing fine.. https://testgrid.k8s.io/conformance-kind#kind,%20master%20(dev)&width=20 that backtrace is not terribly informative so far :| |
|
@liggitt: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
passing but it seems to me the current logic is incorrect: waiting for a node that is unschedulable for the duration of the job timeout (e.g. 30+minutes) doesn't feel right. |
|
the selector for all nodes is pretty easy ( |
|
We discussed in sig-testing slack, TLDR tentative plan is to improve taint handling a bit in the wait-for-node-readiness logic. We could plumb through another flag for kops/kind, but e.g. sonobuoy should work without that for clusters with less than surprising configuration. Ideally we can ignore nodes with the node-role taint, or as @liggitt suggested only wait for certain known taints to be removed. When waiting for taints was originally added only certain taints were referenced ('not-ready and network-unavailable'). Another option would be removing the waiting for node readiness entirely and leaving that up to the tester / cluster operator. |
|
cc @mborsz -- any other particular taints we should be waiting on? |
That would be my preference long-term. "Get your cluster ready, then run the test" makes so much more sense to me |
|
The two mentioned in the PR that added the label checking were NetworkUnavailable and NotReady |
that comes from the pre-existing --node-schedulable-timeout option and default of 30*time.Minute |
|
Maybe a follow up issue but directly related to your questions about plumbing it through ginkgo.sh and such: https://github.com/kubernetes/kubernetes/tree/master/cluster/images/conformance should be updated so that the flag can be set when using the conformance image. This way anyone pursuing CNCF certification via Sonobuoy can supply the values. |
|
So TL;DR from my reading of this issue, is plumbing of a whitelist of taints is a reasonable behavior, but other taints should be respected and possibly dropped from the overall node count. If this is the case, then I think that's a good move forwards. |
|
@liggitt: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Status check on this? |
|
Closing in favor of #78500 |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Allows test invokers to specify a label selector to indicate which nodes should be waited for, rather than hard-coding a label for one particular deployment (kops). This removes special treatment of the node-role.k8s.io/master label from conformance.
Which issue(s) this PR fixes:
xref #65618
Does this PR introduce a user-facing change?:
/area conformance
/sig testing
/cc @smarterclayton