Skip to content

Sets HostNetwork to False for tests which do not require it#78731

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
claudiubelu:tests-hostnetwork
Dec 9, 2019
Merged

Sets HostNetwork to False for tests which do not require it#78731
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
claudiubelu:tests-hostnetwork

Conversation

@claudiubelu
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind failing-test

/sig testing

What this PR does / why we need it:

Some tests are setting HostNetwork=true, even if it is not required for them to pass.

This patch will set the HostNetwork to false for those tests, allowing them to be run on Windows nodes as well.

Running the tests affected by this PR on Linux nodes on different environments, yielded the following results:

Which issue(s) this PR fixes:

Fixes #69559
Original PR: #69525

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/testing Categorizes an issue or PR as relevant to SIG Testing. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 5, 2019
@k8s-ci-robot k8s-ci-robot requested review from MrHohn and krmayankk June 5, 2019 12:13
@k8s-ci-robot k8s-ci-robot added area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test labels Jun 5, 2019
Copy link
Copy Markdown
Member

@MrHohn MrHohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked deep into this, qq that why not reusing #69525 as I see valid comments there?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reading the comments from previous thread (#69525 (comment)). Isn't that still true?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BCLAU This function must be able to access kube-proxy running on the node, so running in non-hostNetwork doesn't work at all. You didn't see a problem because apparently RunHostCmd doesn't return an error when the curl command times out or gets a connection refused. So basically the test was then going on assuming it's not in user proxy mode. How is kube-proxy done in Windows, given there is no iptables? Does it run the user space proxy? If so then an "if windows, return userspace" might be the right solution for this function.

Alternately, the issue is that Windows doesn't support hostNetwork, and we need to talk to kube-proxy in the host network namespace in order to determine the proxy mode. So how do we get this info without hostNetwork? Does kube-proxy only make status available on localhost, or could we get it by passing status.hostIP via the downward API? If we can do that, then the pod can get this info using that IP instead of localhost.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove this from the PR for the time being. Didn't notice that the test actually skips if it doesn't have 2 Ready nodes, so I've missed this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple nodes are required now.

No conformance tests should have Skips in them. If they do, they cannot be promoted.

@claudiubelu
Copy link
Copy Markdown
Contributor Author

Haven't looked deep into this, qq that why not reusing #69525 as I see valid comments there?

True, but I cannot reopen that PR for some issue, I would have otherwise.

As for the comments, I should reply to them, but as you can see in this PR's pastes, I've run those tests on Linux with different networking plugins (azure-vnet-cni, flannel overlay, flannel l2bridge) and they passed.

@MrHohn
Copy link
Copy Markdown
Member

MrHohn commented Jun 6, 2019

As for the comments, I should reply to them, but as you can see in this PR's pastes, I've run those tests on Linux with different networking plugins (azure-vnet-cni, flannel overlay, flannel l2bridge) and they passed.

It is important to ensure that the test not only passes, but also that it verifies what it should be. Specifically for that ProxyMode() func, it makes a HTTP call to kube-proxy's status port and parse out what proxy mode it is using. kube-proxy runs in hostnetwork, using a exec pod without hostnetwork that curls from localhost:10249 won't give us the correct info.

That ProxyMode() is currently used in test that won't work with userspace proxy mode. They explicitly check if proxy mode is userspace and skip the test. They didn't fail in your setup just because userspace proxy mode is not used.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 14, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 19, 2019
Copy link
Copy Markdown
Contributor

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to defer to @kubernetes/sig-node-pr-reviews
/assign @johnbelamaric

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jul 1, 2019
@yujuhong
Copy link
Copy Markdown
Contributor

yujuhong commented Jul 1, 2019

/cc @lzang

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@yujuhong: GitHub didn't allow me to request PR reviews from the following users: lzang.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @lzang

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, YES to removing hostNetwork from these intra-pod and node-pod tests. With hostNetwork enabled they don't even really test what they were supposed to be testing. But we need to figure out what to do about ProxyMode which definitely doesn't work as written when run in hostNetwork: false

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BCLAU This function must be able to access kube-proxy running on the node, so running in non-hostNetwork doesn't work at all. You didn't see a problem because apparently RunHostCmd doesn't return an error when the curl command times out or gets a connection refused. So basically the test was then going on assuming it's not in user proxy mode. How is kube-proxy done in Windows, given there is no iptables? Does it run the user space proxy? If so then an "if windows, return userspace" might be the right solution for this function.

Alternately, the issue is that Windows doesn't support hostNetwork, and we need to talk to kube-proxy in the host network namespace in order to determine the proxy mode. So how do we get this info without hostNetwork? Does kube-proxy only make status available on localhost, or could we get it by passing status.hostIP via the downward API? If we can do that, then the pod can get this info using that IP instead of localhost.

@johnbelamaric
Copy link
Copy Markdown
Member

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 2, 2019
@johnbelamaric
Copy link
Copy Markdown
Member

/sig network

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Jul 2, 2019
@claudiubelu
Copy link
Copy Markdown
Contributor Author

/test pull-kubernetes-e2e-gce

@kikisdeliveryservice
Copy link
Copy Markdown
Member

As an update Bug triage had the incorrect date. Freeze is Nov 14.

@alejandrox1
Copy link
Copy Markdown
Contributor

ping @johnbelamaric
could you take a look please?

@johnbelamaric
Copy link
Copy Markdown
Member

I will block out some time on Friday.

@johnbelamaric
Copy link
Copy Markdown
Member

Ok. I think I have figured out why this is so confusing. It comes down to that PR #71468 should not have been merged as it is written. It separates "host neworking" from the HostTestContainerPod. But that's the whole raison d'être of the HostTestContainerPod.

Instead, it should have just added a boolean that defined whether or not to create the HostTestContainerPod at all. If you look at the tests where you are passing in false, they use DialFromTestContainer or DialFromEndpointContainer, neither of which use the HostTestContainerPod.

So, I cannot approve this. In fact, we should go back and fix the others, I could approve something that doesn't create the HostTestContainerPod in all cases.

@kikisdeliveryservice
Copy link
Copy Markdown
Member

@alejandrox1 @johnbelamaric : Should this be bumped to 1.18?

-Bug Triage Team

@johnbelamaric
Copy link
Copy Markdown
Member

yes, probably that would make sense.

@kikisdeliveryservice
Copy link
Copy Markdown
Member

/milestone v1.18

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.17, v1.18 Nov 12, 2019
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/windows Categorizes an issue or PR as relevant to SIG Windows. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 18, 2019
@claudiubelu claudiubelu force-pushed the tests-hostnetwork branch 2 times, most recently from 236ade1 to 41760d2 Compare November 18, 2019 17:39
@claudiubelu
Copy link
Copy Markdown
Contributor Author

/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-gce

Some tests are setting HostNetwork=true, even if it is not required
for them to pass.

This patch will set the HostNetwork to false for those tests, allowing
them to be run on Windows nodes as well.
@johnbelamaric
Copy link
Copy Markdown
Member

Not sure why it says approved from me, going to make sure that's cancelled until I hear back on my most recent comments.

/approve cancel

@claudiubelu
Copy link
Copy Markdown
Contributor Author

Not sure why it says approved from me, going to make sure that's cancelled until I hear back on my most recent comments.

/approve cancel

Yeah, sorry. I've addressed your comments, but haven't commented on it.

Ok. I think I have figured out why this is so confusing. It comes down to that PR #71468 should not have been merged as it is written. It separates "host neworking" from the HostTestContainerPod. But that's the whole raison d'être of the HostTestContainerPod.

This PR now removes test/e2e/windows/network.go

Instead, it should have just added a boolean that defined whether or not to create the HostTestContainerPod at all. If you look at the tests where you are passing in false, they use DialFromTestContainer or DialFromEndpointContainer, neither of which use the HostTestContainerPod.

Added. HostTestContainerPod is no longer created if host networking is not required. Although, technically, even DialFromTestContainer and DialFromEndpointContainer were exec-ing into HostTestContainerPod (even though it was unnecessary). Had to address that as well.

So, I cannot approve this. In fact, we should go back and fix the others, I could approve something that doesn't create the HostTestContainerPod in all cases.

Done.

@claudiubelu
Copy link
Copy Markdown
Contributor Author

@johnbelamaric So, is it alright with you now?

@johnbelamaric
Copy link
Copy Markdown
Member

Yes!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 9, 2019
@spiffxp
Copy link
Copy Markdown
Contributor

spiffxp commented Dec 9, 2019

/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: claudiubelu, johnbelamaric, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 9, 2019
@k8s-ci-robot k8s-ci-robot merged commit 3c4d2a5 into kubernetes:master Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/conformance Issues or PRs related to kubernetes conformance tests area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Some tests do not require HostNetwork=true in order to pass