Skip to content

daemon: Fix detection of BPF/XDP NodePort, BPF masq and host-fw devices#11894

Merged
aanm merged 13 commits intomasterfrom
pr/brb/auto-multi-dev-v3
Jun 11, 2020
Merged

daemon: Fix detection of BPF/XDP NodePort, BPF masq and host-fw devices#11894
aanm merged 13 commits intomasterfrom
pr/brb/auto-multi-dev-v3

Conversation

@brb
Copy link
Copy Markdown
Member

@brb brb commented Jun 4, 2020

First, this PR extends the BPF NodePort device auto-detection mechanism by:

  1. In addition to a device with a default route, it considers a device with k8s Node IP addr. The k8s Node IP addr is derived from the self Node object, and it's either InternalIP or ExternalIP or nil (the former is preferred).
  2. BPF NodePort direct routing device can be automatically detected (previously, it was the first device among all BPF NodePort devices). If there is only one BPF NodePort device is set, then that device is used for the direct routing. Otherwise, the k8s Node IP device is used. Users can specify the device via --direct-routing-device (global.nodePort.directRoutingDevice).
  3. Enables XDP on the direct routing device (see reasoning in the commit msg).

The detected devices by 1. are used for BPF masquerading and host-fw too.

Second, this PR introduces --devices flag and deprecated --device. If the latter is specified, its value is appended to the former.

Reviewable per commit.

One limitation which hasn't been resolved in this PR, is that if in the k8s dualstack mode InternalIPv4 and InternalIPv6 are assigned to two different devices, then only one will be considered. I'm going to address it in a separate PR (not a release blocker).

Once this PR is merged, I will move initKubeProxyReplacement() to daemon/cmd/kube_proxy.go (I didn't want to move in this PR to avoid complicating the reviewing).

Fix: #11789

@brb brb added priority/release-blocker area/daemon Impacts operation of the Cilium daemon. labels Jun 4, 2020
@maintainer-s-little-helper
Copy link
Copy Markdown

Please set the appropriate release note label.

3 similar comments
@maintainer-s-little-helper
Copy link
Copy Markdown

Please set the appropriate release note label.

@maintainer-s-little-helper
Copy link
Copy Markdown

Please set the appropriate release note label.

@maintainer-s-little-helper
Copy link
Copy Markdown

Please set the appropriate release note label.

@brb brb added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jun 4, 2020
@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 2a8468a to a22a054 Compare June 4, 2020 16:18
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 4, 2020

retest-net-next

@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 160e37e to 4c2b340 Compare June 4, 2020 16:48
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 4, 2020

retest-net-next

@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 4c2b340 to aed6fda Compare June 4, 2020 18:49
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 4, 2020

retest-net-next

@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from aed6fda to 1039f04 Compare June 5, 2020 07:25
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 5, 2020

retest-net-next

@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 1039f04 to c1067f1 Compare June 5, 2020 08:57
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 5, 2020

retest-net-next

@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from c1067f1 to 3ae92b9 Compare June 5, 2020 11:56
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 5, 2020

retest-net-next

@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 972898a to e3251e1 Compare June 5, 2020 13:51
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 5, 2020

retest-net-next

@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from ab4c346 to ad9a333 Compare June 5, 2020 14:15
@brb brb changed the title daemon: Detect devices for BPF NodePort daemon: Fix detection of BPF/XDP NodePort, BPF masq and host-fw devices Jun 5, 2020
Comment thread pkg/node/types/node.go Outdated
Comment thread pkg/option/config.go Outdated
Comment thread daemon/cmd/daemon_main.go Outdated
@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch 3 times, most recently from 1f8bbdf to 772c678 Compare June 5, 2020 14:46
brb added 11 commits June 10, 2020 14:10
This is required because the NodePort BPF device detection will need
to know self k8s Node IP addrs.

Also, move BPF-masq and host-fw checks too. The both depend on settings
configured by initKubeProxyReplacementOptions().

Signed-off-by: Martynas Pumputis <m@lambda.lt>
We are going to extend it, which would make
initKubeProxyReplacementOptions() less readable otherwise.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit stores k8s Node IP addr (InternalIP > ExternalIP > nil)
which is going to be used by the BPF NodePort device auto-detection.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
In additional to a device with a default route, consider devices with
k8s node IP addr.

This should cover a case, when a host has two interfaces - one for
in-cluster communication, and one for outside.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit adds a new agent flag "--direct-routing-device". The flag
is used in BPF NodePort in the direct routing mode.

If the flag is not set, then its value is being automatically detected.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
The param specifies to which devices bpf_host.o should be attached.
Currently, it's used by BPF NodePort, host-fw and BPF masquerading.

Also, mark --device as deprecated. If a user specifies both,
cilium-agent will log a fatal msg.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Previously, the constraint made it impossible to have BPF NodePort
exposed via multiple devices, as a request to a NodePort service which
was received by non-XDP device (handled by TC) was dropped due to
the constraint.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Attach XDP NodePort to a device which is used for direct routing among
nodes.

In the case of a multi-dev setup, if we attach to other than the direct
routing device, a request to a NodePort svc which real destination is
a remote backend won't be redirected to another device (only possible
via hairpinning). So, we need to attach to a device which can forward
to another node which is the direct routing device.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Otherwise, we would need to make the daemon unit test suite as
privileged due to bpftool depedency for the feature detection in
initKubeProxyReplacementOptions().

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Previously, when running with "--k8s-require-ipv{4,6}-pod-cidr=false",
retrieveNodeInformation() might have failed retrieving a self
(Cilium)Node object on a busy cluster.

To avoid this, return an error if the BPF NodePort dev auto-detection
might happen. This will make the retry mechanism not to give up on
waiting for the object.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from dc57fd1 to 02d0550 Compare June 10, 2020 12:10
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 10, 2020

test-me-please

@joestringer
Copy link
Copy Markdown
Member

We discussed the 4.19 failures in detail during the sig-datapath meeting today.

Core issue is that previously the CI did not enable nodeport on the devices on 4.19. This new --devices auto-detection is now enabling nodeport for up to two devices: the device with the IP that k8s considers as the node IP, and the device with the default route. That's why the CI change explicitly turns kube-proxy off. PR #11977 is expected to resolve this issue.

@brb will investigate configuring specific options to retain the current CI coverage while allowing the rest of this PR to run against CI and validate the behaviour, so we can merge and separately address the remaining complexity issues which we are already tracking elsewhere.

@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 02d0550 to 70b0d62 Compare June 10, 2020 17:03
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 10, 2020

test-me-please

@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 11, 2020

retest-4.19

Disable the kube-proxy replacement on the CI 4.19 job until
has been merged #11915, but
keep bpf_sock to avoid bpf_lxc complexity issues.

Also, get rid of non-working kernel vsn setting which complicates
passing of the KERNEL env var to ginkgo test runner.

Finally, disable ipsec + vxlan test on 4.19, as it is broken with the
contemporary 4.19 setup (TODO to investigate it).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/brb/auto-multi-dev-v3 branch from 70b0d62 to 3afa14b Compare June 11, 2020 12:44
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 11, 2020

test-me-please

Comment thread pkg/node/types/node.go

// GetK8sNodeIPs returns k8s Node IP (either InternalIP or ExternalIP or nil;
// the former is prefered).
func (n *Node) GetK8sNodeIP() net.IP {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at the file recently and noticed that we have two very similar functions: GetK8sNodeIP and GetNodeIP (immediately below). I tracked it down to this PR. I am just curious; why did we need to add this @brb?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetNodeIP() can return an IP which is neither k8s InternalIP nor k8s ExternalIP. For the device detection, we want to use k8s IPs, as they are used for communication between nodes. For this, I introduced GetK8sNodeIP().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Impacts operation of the Cilium daemon. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make Cilium list of devices clearly and consistently apply functionality across those devices

7 participants