Skip to content

test/K8sServices: re-enable IPv4 fragment tests on kernel 4.19#12159

Merged
aanm merged 1 commit intomasterfrom
pr/qmonnet/fragtest_4-19
Jun 21, 2020
Merged

test/K8sServices: re-enable IPv4 fragment tests on kernel 4.19#12159
aanm merged 1 commit intomasterfrom
pr/qmonnet/fragtest_4-19

Conversation

@qmonnet
Copy link
Copy Markdown
Member

@qmonnet qmonnet commented Jun 17, 2020

The test for fragment tracking support got a fix with commit 0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"), where the pattern for searching entries in the Conntrack table accounts for DNAT not happening if kube-proxy is present.

Following recent changes in the datapath and tests for the Linux 4.19 kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in use. This led to CI failures, and the test was disabled for 4.19 kernels with commit 1120aed ("test/K8sServices: disable fragment tracking test for kernel 4.19").

Now that complexity issues are fixed (see #11977 and #12045), let's enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present and bpf_sock (host-reachable services) is not in use. This is also the case for net-next kernels (this didn't fail in CI before because we do not test with kube-proxy on net-next).

Note that (as far as I know) both 4.19 and net-next always use bpf_sock in CI runs, so the check on hostReachableServices is currently superfluous. Let's have it all the same, in case something changes in the future, to avoid unexpected breakage.

@qmonnet qmonnet added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Jun 17, 2020
@qmonnet
Copy link
Copy Markdown
Member Author

qmonnet commented Jun 17, 2020

test-me-please

@coveralls
Copy link
Copy Markdown

coveralls commented Jun 18, 2020

Coverage Status

Coverage decreased (-0.004%) to 37.101% when pulling da70faf on pr/qmonnet/fragtest_4-19 into e601e8f on master.

@qmonnet qmonnet marked this pull request as ready for review June 18, 2020 08:59
@qmonnet qmonnet requested a review from a team as a code owner June 18, 2020 08:59
@qmonnet qmonnet requested review from a team, brb and pchaigno June 18, 2020 09:00
Copy link
Copy Markdown
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that (as far as I know) both 4.19 and net-next always use bpf_sock in CI runs, so the check on hostReachableServices is currently superfluous. Let's have it all the same, in case something changes in the future, to avoid unexpected breakage.

Did you check locally that it works when kube-proxy-replacement=disabled and hostReachableServices is enabled?

Comment thread test/k8sT/Services.go Outdated
The test for fragment tracking support got a fix with commit
0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"),
where the pattern for searching entries in the Conntrack table accounts
for DNAT not happening if kube-proxy is present.

Following recent changes in the datapath and tests for the Linux 4.19
kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in
use. This led to CI failures, and the test was disabled for 4.19 kernels
with commit 1120aed ("test/K8sServices: disable fragment tracking
test for kernel 4.19").

Now that complexity issues are fixed (see #11977 and #12045), let's
enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present
and bpf_sock (host-reachable services) is not in use. This is also the
case for net-next kernels (this didn't fail in CI before because we do
not test with kube-proxy on net-next).

Note that (as far as I know) both 4.19 and net-next always use bpf_sock
in CI runs, so the check on hostReachableServices is currently
superfluous. Let's have it all the same, in case something changes in
the future, to avoid unexpected breakage.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
@qmonnet qmonnet force-pushed the pr/qmonnet/fragtest_4-19 branch from 8f04915 to da70faf Compare June 18, 2020 16:14
@qmonnet
Copy link
Copy Markdown
Member Author

qmonnet commented Jun 18, 2020

Moved the check on host-reachable services to a dedicated helper.

Yes, I tested locally with KUBEPROXY=0 and KUBEPROXY=1. Also tried setting global.hostServices.enabled to true and false for each case when deploying Cilium.

Incremental diff
diff --git a/test/helpers/utils.go b/test/helpers/utils.go
index 6884c744258d..571d465460b8 100644
--- a/test/helpers/utils.go
+++ b/test/helpers/utils.go
@@ -558,6 +558,24 @@ func DoesNotSupportMetalLB() bool {
 	return true
 }
 
+func (kub *Kubectl) HasHostReachableServices(pod string, checkTCP, checkUDP bool) bool {
+	status := kub.CiliumExecContext(context.TODO(), pod,
+		"cilium status -o jsonpath='{.kube-proxy-replacement.features.hostReachableServices}'")
+	status.ExpectSuccess("Failed to get status: %s", status.OutputPrettyPrint())
+	lines := status.ByLines()
+	Expect(len(lines)).ShouldNot(Equal(0), "Failed to get hostReachableServices status")
+
+	// One-line result is e.g. "{true [TCP UDP]}" if host-reachable
+	// services are activated for both protocols.
+	if checkUDP && !strings.Contains(lines[0], "UDP") {
+		return false
+	}
+	if checkTCP && !strings.Contains(lines[0], "TCP") {
+		return false
+	}
+	return true
+}
+
 // GetNodeWithoutCilium returns a name of a node which does not run cilium.
 func GetNodeWithoutCilium() string {
 	return os.Getenv("NO_CILIUM_ON_NODE")
diff --git a/test/k8sT/Services.go b/test/k8sT/Services.go
index 80cf5334acb5..8ac348456e8c 100644
--- a/test/k8sT/Services.go
+++ b/test/k8sT/Services.go
@@ -1061,22 +1061,12 @@ var _ = Describe("K8sServicesTest", func() {
 				srcPort = 12345
 				hasDNAT = true
 			)
-
 			// Destination address and port for fragmented datagram
 			// are not DNAT-ed with kube-proxy but without bpf_sock.
 			if helpers.RunsWithKubeProxy() {
 				ciliumPodK8s1, err := kubectl.GetCiliumPodOnNode(helpers.CiliumNamespace, helpers.K8s1)
 				ExpectWithOffset(1, err).Should(BeNil(), "Cannot get cilium pod on k8s1")
-				status := kubectl.CiliumExecContext(context.TODO(), ciliumPodK8s1,
-					"cilium status -o jsonpath='{.kube-proxy-replacement.features.hostReachableServices}'")
-				status.ExpectSuccess("Failed to get status: %s", status.OutputPrettyPrint())
-				// One-line result is "{true [TCP UDP]}" if
-				// host-reachable services are activated for
-				// both protocols. Consider we do not have DNAT
-				// only if UDP is missing.
-				if len(status.ByLines()) != 0 && !strings.Contains(status.ByLines()[0], "UDP") {
-					hasDNAT = false
-				}
+				hasDNAT = kubectl.HasHostReachableServices(ciliumPodK8s1, false, true)
 			}
 
 			// Get testDSClient and testDS pods running on k8s1.

@qmonnet qmonnet requested a review from pchaigno June 18, 2020 16:18
@qmonnet
Copy link
Copy Markdown
Member Author

qmonnet commented Jun 18, 2020

test-me-please

@qmonnet
Copy link
Copy Markdown
Member Author

qmonnet commented Jun 19, 2020

Comment thread test/helpers/utils.go
}

func (kub *Kubectl) HasHostReachableServices(pod string, checkTCP, checkUDP bool) bool {
status := kub.CiliumExecContext(context.TODO(), pod,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of using context.TODO(), but given how nested this func is, I it would be detrimental to readability to pass newly created context from the test case.

@maintainer-s-little-helper maintainer-s-little-helper Bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 19, 2020
@aanm aanm merged commit 5b9503f into master Jun 21, 2020
@aanm aanm deleted the pr/qmonnet/fragtest_4-19 branch June 21, 2020 12:14
@aanm aanm mentioned this pull request Jun 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: K8sServicesTest Checks service across nodes Supports IPv4 Fragments: Failed to account for IPv4 fragments

6 participants