daemon: Fix detection of BPF/XDP NodePort, BPF masq and host-fw devices#11894
daemon: Fix detection of BPF/XDP NodePort, BPF masq and host-fw devices#11894
Conversation
|
Please set the appropriate release note label. |
3 similar comments
|
Please set the appropriate release note label. |
|
Please set the appropriate release note label. |
|
Please set the appropriate release note label. |
2a8468a to
a22a054
Compare
|
retest-net-next |
160e37e to
4c2b340
Compare
|
retest-net-next |
4c2b340 to
aed6fda
Compare
|
retest-net-next |
aed6fda to
1039f04
Compare
|
retest-net-next |
1039f04 to
c1067f1
Compare
|
retest-net-next |
c1067f1 to
3ae92b9
Compare
|
retest-net-next |
972898a to
e3251e1
Compare
|
retest-net-next |
ab4c346 to
ad9a333
Compare
1f8bbdf to
772c678
Compare
This is required because the NodePort BPF device detection will need to know self k8s Node IP addrs. Also, move BPF-masq and host-fw checks too. The both depend on settings configured by initKubeProxyReplacementOptions(). Signed-off-by: Martynas Pumputis <m@lambda.lt>
We are going to extend it, which would make initKubeProxyReplacementOptions() less readable otherwise. Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit stores k8s Node IP addr (InternalIP > ExternalIP > nil) which is going to be used by the BPF NodePort device auto-detection. Signed-off-by: Martynas Pumputis <m@lambda.lt>
In additional to a device with a default route, consider devices with k8s node IP addr. This should cover a case, when a host has two interfaces - one for in-cluster communication, and one for outside. Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit adds a new agent flag "--direct-routing-device". The flag is used in BPF NodePort in the direct routing mode. If the flag is not set, then its value is being automatically detected. Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
The param specifies to which devices bpf_host.o should be attached. Currently, it's used by BPF NodePort, host-fw and BPF masquerading. Also, mark --device as deprecated. If a user specifies both, cilium-agent will log a fatal msg. Signed-off-by: Martynas Pumputis <m@lambda.lt>
Previously, the constraint made it impossible to have BPF NodePort exposed via multiple devices, as a request to a NodePort service which was received by non-XDP device (handled by TC) was dropped due to the constraint. Signed-off-by: Martynas Pumputis <m@lambda.lt>
Attach XDP NodePort to a device which is used for direct routing among nodes. In the case of a multi-dev setup, if we attach to other than the direct routing device, a request to a NodePort svc which real destination is a remote backend won't be redirected to another device (only possible via hairpinning). So, we need to attach to a device which can forward to another node which is the direct routing device. Signed-off-by: Martynas Pumputis <m@lambda.lt>
Otherwise, we would need to make the daemon unit test suite as privileged due to bpftool depedency for the feature detection in initKubeProxyReplacementOptions(). Signed-off-by: Martynas Pumputis <m@lambda.lt>
Previously, when running with "--k8s-require-ipv{4,6}-pod-cidr=false",
retrieveNodeInformation() might have failed retrieving a self
(Cilium)Node object on a busy cluster.
To avoid this, return an error if the BPF NodePort dev auto-detection
might happen. This will make the retry mechanism not to give up on
waiting for the object.
Signed-off-by: Martynas Pumputis <m@lambda.lt>
dc57fd1 to
02d0550
Compare
|
test-me-please |
|
We discussed the 4.19 failures in detail during the sig-datapath meeting today. Core issue is that previously the CI did not enable nodeport on the devices on 4.19. This new @brb will investigate configuring specific options to retain the current CI coverage while allowing the rest of this PR to run against CI and validate the behaviour, so we can merge and separately address the remaining complexity issues which we are already tracking elsewhere. |
02d0550 to
70b0d62
Compare
|
test-me-please |
|
retest-4.19 |
Disable the kube-proxy replacement on the CI 4.19 job until has been merged #11915, but keep bpf_sock to avoid bpf_lxc complexity issues. Also, get rid of non-working kernel vsn setting which complicates passing of the KERNEL env var to ginkgo test runner. Finally, disable ipsec + vxlan test on 4.19, as it is broken with the contemporary 4.19 setup (TODO to investigate it). Signed-off-by: Martynas Pumputis <m@lambda.lt>
70b0d62 to
3afa14b
Compare
|
test-me-please |
|
|
||
| // GetK8sNodeIPs returns k8s Node IP (either InternalIP or ExternalIP or nil; | ||
| // the former is prefered). | ||
| func (n *Node) GetK8sNodeIP() net.IP { |
There was a problem hiding this comment.
I was looking at the file recently and noticed that we have two very similar functions: GetK8sNodeIP and GetNodeIP (immediately below). I tracked it down to this PR. I am just curious; why did we need to add this @brb?
There was a problem hiding this comment.
GetNodeIP() can return an IP which is neither k8s InternalIP nor k8s ExternalIP. For the device detection, we want to use k8s IPs, as they are used for communication between nodes. For this, I introduced GetK8sNodeIP().
First, this PR extends the BPF NodePort device auto-detection mechanism by:
InternalIPorExternalIPornil(the former is preferred).--direct-routing-device(global.nodePort.directRoutingDevice).The detected devices by 1. are used for BPF masquerading and host-fw too.
Second, this PR introduces
--devicesflag and deprecated--device. If the latter is specified, its value is appended to the former.Reviewable per commit.
One limitation which hasn't been resolved in this PR, is that if in the k8s dualstack mode InternalIPv4 and InternalIPv6 are assigned to two different devices, then only one will be considered. I'm going to address it in a separate PR (not a release blocker).
Once this PR is merged, I will move
initKubeProxyReplacement()todaemon/cmd/kube_proxy.go(I didn't want to move in this PR to avoid complicating the reviewing).Fix: #11789