daemon: Fix detection of BPF/XDP NodePort, BPF masq and host-fw devices by brb · Pull Request #11894 · cilium/cilium

brb · 2020-06-04T14:27:39Z

First, this PR extends the BPF NodePort device auto-detection mechanism by:

In addition to a device with a default route, it considers a device with k8s Node IP addr. The k8s Node IP addr is derived from the self Node object, and it's either InternalIP or ExternalIP or nil (the former is preferred).
BPF NodePort direct routing device can be automatically detected (previously, it was the first device among all BPF NodePort devices). If there is only one BPF NodePort device is set, then that device is used for the direct routing. Otherwise, the k8s Node IP device is used. Users can specify the device via --direct-routing-device (global.nodePort.directRoutingDevice).
Enables XDP on the direct routing device (see reasoning in the commit msg).

The detected devices by 1. are used for BPF masquerading and host-fw too.

Second, this PR introduces --devices flag and deprecated --device. If the latter is specified, its value is appended to the former.

Reviewable per commit.

One limitation which hasn't been resolved in this PR, is that if in the k8s dualstack mode InternalIPv4 and InternalIPv6 are assigned to two different devices, then only one will be considered. I'm going to address it in a separate PR (not a release blocker).

Once this PR is merged, I will move initKubeProxyReplacement() to daemon/cmd/kube_proxy.go (I didn't want to move in this PR to avoid complicating the reviewing).

Fix: #11789

maintainer-s-little-helper · 2020-06-04T14:27:41Z

Please set the appropriate release note label.

maintainer-s-little-helper · 2020-06-04T14:27:41Z

Please set the appropriate release note label.

maintainer-s-little-helper · 2020-06-04T14:27:41Z

Please set the appropriate release note label.

maintainer-s-little-helper · 2020-06-04T14:27:41Z

Please set the appropriate release note label.

brb · 2020-06-04T16:18:44Z

retest-net-next

brb · 2020-06-04T16:49:03Z

retest-net-next

brb · 2020-06-04T18:49:59Z

retest-net-next

brb · 2020-06-05T07:50:05Z

retest-net-next

brb · 2020-06-05T08:58:30Z

retest-net-next

brb · 2020-06-05T11:56:27Z

retest-net-next

brb · 2020-06-05T13:51:21Z

retest-net-next

This is required because the NodePort BPF device detection will need to know self k8s Node IP addrs. Also, move BPF-masq and host-fw checks too. The both depend on settings configured by initKubeProxyReplacementOptions(). Signed-off-by: Martynas Pumputis <m@lambda.lt>

We are going to extend it, which would make initKubeProxyReplacementOptions() less readable otherwise. Signed-off-by: Martynas Pumputis <m@lambda.lt>

This commit stores k8s Node IP addr (InternalIP > ExternalIP > nil) which is going to be used by the BPF NodePort device auto-detection. Signed-off-by: Martynas Pumputis <m@lambda.lt>

In additional to a device with a default route, consider devices with k8s node IP addr. This should cover a case, when a host has two interfaces - one for in-cluster communication, and one for outside. Signed-off-by: Martynas Pumputis <m@lambda.lt>

This commit adds a new agent flag "--direct-routing-device". The flag is used in BPF NodePort in the direct routing mode. If the flag is not set, then its value is being automatically detected. Signed-off-by: Martynas Pumputis <m@lambda.lt>

Signed-off-by: Martynas Pumputis <m@lambda.lt>

The param specifies to which devices bpf_host.o should be attached. Currently, it's used by BPF NodePort, host-fw and BPF masquerading. Also, mark --device as deprecated. If a user specifies both, cilium-agent will log a fatal msg. Signed-off-by: Martynas Pumputis <m@lambda.lt>

Previously, the constraint made it impossible to have BPF NodePort exposed via multiple devices, as a request to a NodePort service which was received by non-XDP device (handled by TC) was dropped due to the constraint. Signed-off-by: Martynas Pumputis <m@lambda.lt>

Attach XDP NodePort to a device which is used for direct routing among nodes. In the case of a multi-dev setup, if we attach to other than the direct routing device, a request to a NodePort svc which real destination is a remote backend won't be redirected to another device (only possible via hairpinning). So, we need to attach to a device which can forward to another node which is the direct routing device. Signed-off-by: Martynas Pumputis <m@lambda.lt>

Otherwise, we would need to make the daemon unit test suite as privileged due to bpftool depedency for the feature detection in initKubeProxyReplacementOptions(). Signed-off-by: Martynas Pumputis <m@lambda.lt>

Previously, when running with "--k8s-require-ipv{4,6}-pod-cidr=false", retrieveNodeInformation() might have failed retrieving a self (Cilium)Node object on a busy cluster. To avoid this, return an error if the BPF NodePort dev auto-detection might happen. This will make the retry mechanism not to give up on waiting for the object. Signed-off-by: Martynas Pumputis <m@lambda.lt>

brb · 2020-06-10T12:12:17Z

test-me-please

joestringer · 2020-06-10T16:30:07Z

We discussed the 4.19 failures in detail during the sig-datapath meeting today.

Core issue is that previously the CI did not enable nodeport on the devices on 4.19. This new --devices auto-detection is now enabling nodeport for up to two devices: the device with the IP that k8s considers as the node IP, and the device with the default route. That's why the CI change explicitly turns kube-proxy off. PR #11977 is expected to resolve this issue.

@brb will investigate configuring specific options to retain the current CI coverage while allowing the rest of this PR to run against CI and validate the behaviour, so we can merge and separately address the remaining complexity issues which we are already tracking elsewhere.

brb · 2020-06-10T17:03:59Z

test-me-please

brb · 2020-06-11T09:07:14Z

retest-4.19

Disable the kube-proxy replacement on the CI 4.19 job until has been merged #11915, but keep bpf_sock to avoid bpf_lxc complexity issues. Also, get rid of non-working kernel vsn setting which complicates passing of the KERNEL env var to ginkgo test runner. Finally, disable ipsec + vxlan test on 4.19, as it is broken with the contemporary 4.19 setup (TODO to investigate it). Signed-off-by: Martynas Pumputis <m@lambda.lt>

brb · 2020-06-11T12:44:05Z

test-me-please

christarazi · 2020-07-07T23:29:27Z


+// GetK8sNodeIPs returns k8s Node IP (either InternalIP or ExternalIP or nil;
+// the former is prefered).
+func (n *Node) GetK8sNodeIP() net.IP {


I was looking at the file recently and noticed that we have two very similar functions: GetK8sNodeIP and GetNodeIP (immediately below). I tracked it down to this PR. I am just curious; why did we need to add this @brb?

GetNodeIP() can return an IP which is neither k8s InternalIP nor k8s ExternalIP. For the device detection, we want to use k8s IPs, as they are used for communication between nodes. For this, I introduced GetK8sNodeIP().

brb added priority/release-blocker area/daemon Impacts operation of the Cilium daemon. labels Jun 4, 2020

maintainer-s-little-helper Bot added the dont-merge/needs-release-note label Jun 4, 2020

brb added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jun 4, 2020

maintainer-s-little-helper Bot removed the dont-merge/needs-release-note label Jun 4, 2020

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 2a8468a to a22a054 Compare June 4, 2020 16:18

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 160e37e to 4c2b340 Compare June 4, 2020 16:48

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 4c2b340 to aed6fda Compare June 4, 2020 18:49

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from aed6fda to 1039f04 Compare June 5, 2020 07:25

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 1039f04 to c1067f1 Compare June 5, 2020 08:57

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from c1067f1 to 3ae92b9 Compare June 5, 2020 11:56

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 972898a to e3251e1 Compare June 5, 2020 13:51

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from ab4c346 to ad9a333 Compare June 5, 2020 14:15

brb changed the title ~~daemon: Detect devices for BPF NodePort~~ daemon: Fix detection of BPF/XDP NodePort, BPF masq and host-fw devices Jun 5, 2020

aanm reviewed Jun 5, 2020

View reviewed changes

Comment thread pkg/node/types/node.go Outdated

Comment thread pkg/option/config.go Outdated

Comment thread daemon/cmd/daemon_main.go Outdated

brb force-pushed the pr/brb/auto-multi-dev-v3 branch 3 times, most recently from 1f8bbdf to 772c678 Compare June 5, 2020 14:46

brb added 11 commits June 10, 2020 14:10

daemon: Move device detection into separate function

9cd3b64

We are going to extend it, which would make initKubeProxyReplacementOptions() less readable otherwise. Signed-off-by: Martynas Pumputis <m@lambda.lt>

k8s: Store k8s Node IP addr

5abbec5

This commit stores k8s Node IP addr (InternalIP > ExternalIP > nil) which is going to be used by the BPF NodePort device auto-detection. Signed-off-by: Martynas Pumputis <m@lambda.lt>

helm/daemon: Add --direct-routing-device

8ddf471

This commit adds a new agent flag "--direct-routing-device". The flag is used in BPF NodePort in the direct routing mode. If the flag is not set, then its value is being automatically detected. Signed-off-by: Martynas Pumputis <m@lambda.lt>

daemon: Add unit tests for device detection

aa04a8f

Signed-off-by: Martynas Pumputis <m@lambda.lt>

daemon: Disable kube-proxy replacement in unit tests

eb8f0cf

Otherwise, we would need to make the daemon unit test suite as privileged due to bpftool depedency for the feature detection in initKubeProxyReplacementOptions(). Signed-off-by: Martynas Pumputis <m@lambda.lt>

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from dc57fd1 to 02d0550 Compare June 10, 2020 12:10

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 02d0550 to 70b0d62 Compare June 10, 2020 17:03

brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 70b0d62 to 3afa14b Compare June 11, 2020 12:44

aanm merged commit 6beb6b3 into master Jun 11, 2020

aanm deleted the pr/brb/auto-multi-dev-v3 branch June 11, 2020 14:43

aanm mentioned this pull request Jun 11, 2020

v1.8 backports 2020-06-11 #12027

Merged

aanm added backport-pending/1.8 and removed needs-backport/1.8 labels Jun 11, 2020

pchaigno mentioned this pull request Jun 12, 2020

CI: K8sServicesTest Checks service across nodes Supports IPv4 Fragments: Failed to account for IPv4 fragments #10929

Closed

aanm added backport-done/1.8 and removed backport-pending/1.8 labels Jun 12, 2020

pchaigno mentioned this pull request Jun 29, 2020

daemon: Skip devices without hardware address during device detection #12321

Merged

christarazi reviewed Jul 7, 2020

View reviewed changes

Conversation

brb commented Jun 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maintainer-s-little-helper Bot commented Jun 4, 2020

Uh oh!

maintainer-s-little-helper Bot commented Jun 4, 2020

Uh oh!

maintainer-s-little-helper Bot commented Jun 4, 2020

Uh oh!

maintainer-s-little-helper Bot commented Jun 4, 2020

Uh oh!

brb commented Jun 4, 2020

Uh oh!

brb commented Jun 4, 2020

Uh oh!

brb commented Jun 4, 2020

Uh oh!

brb commented Jun 5, 2020

Uh oh!

brb commented Jun 5, 2020

Uh oh!

brb commented Jun 5, 2020

Uh oh!

brb commented Jun 5, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brb commented Jun 10, 2020

Uh oh!

joestringer commented Jun 10, 2020

Uh oh!

brb commented Jun 10, 2020

Uh oh!

brb commented Jun 11, 2020

Uh oh!

brb commented Jun 11, 2020

Uh oh!

christarazi Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

brb Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

brb commented Jun 4, 2020 •

edited

Loading