daemon: Skip devices without hardware address during device detection#12321
Conversation
We need NodePort and direct routing devices to have a MAC address. If
they don't, init.sh fails with the following error:
level=warning msg="+ for NATIVE_DEV in ${NATIVE_DEVS//;/ }" subsys=datapath-loader
level=warning msg="++ cat /sys/class/net/lo/ifindex" subsys=datapath-loader
level=warning msg="+ IDX=1" subsys=datapath-loader
level=warning msg="++ ip link show lo" subsys=datapath-loader
level=warning msg="++ grep ether" subsys=datapath-loader
level=warning msg="++ awk '{print $2}'" subsys=datapath-loader
level=warning msg="+ MAC=" subsys=datapath-loader
level=error msg="Error while initializing daemon" error="exit status 1" subsys=daemon
level=fatal msg="Error while creating daemon" error="exit status 1" subsys=daemon
Thus, we need to skip auto-detected devices that don't have a MAC
address. This commit implements that and was tested by injecting a
loopback interface with an IP address in the code, in the dev. VM:
loAddr, err := netlink.ParseAddr("192.168.33.11/32")
if err == nil {
loAddr.LinkIndex = 1
addrs = append(addrs, *loAddr)
}
Fixes: #12228
Fixes: #12304
Fixes: 6730d0f ("daemon: Extend BPF NodePort device auto-detection")
Signed-off-by: Paul Chaignon <paul@cilium.io>
|
test-me-please |
|
Were you able to reproduce this locally to validate the fix? |
borkmann
left a comment
There was a problem hiding this comment.
LGTM, we can revisit adding support by adding a all-zero HW address later (plus checking that redirect does the right thing).
Yes, I reproduced by adding a bit of code to inject a loopback device with an IP address such that it would be selected by the device detection (see If preferred, I think I would be able to reproduce without adding code now that I understand the different steps of the device detection (I need to set a specific IP address to an interface with an index higher than
One thing that's important to note here is that I expect Cilium will still fail to start if a user explicitly configures a device without a HW address. This PR only fixes the detection. My rationale is that excluding devices without HW addresses is a good way to avoid corner cases. If a user purposely wants to use a device without a HW address (common case is WireGuard), they should set it explicitly and we will need to provide a proper fix such as the all-zero HW address you mention. There's a bit more work required for that fix (in particular, need to reproduce and maybe document) and I wanted to get the device-detection fix out quickly since a lot of users seem to be hitting that. Maybe I should also exclude devices explicitly set by users if they don't have a HW address now? Unless we expect to have a fix for that soon and it's not worth it? |
|
I agree with fixing the auto-detection for most users where they're just incidentally hitting this without explicitly specifying devices, hence why I'm happy to get this in as-is. In a lot of cases today, Cilium will fail out early to help signal to users that the configuration is wrong. In this case, I think the detection is actually too late and the log messages uninterpretable so at the minimum it'd be nice to add such a check to the |
We need NodePort and direct routing devices to have a MAC address. If they don't, init.sh fails with the following error:
Thus, we need to skip auto-detected devices that don't have a MAC address. This commit implements that and was tested by injecting a loopback interface with an IP address in the code, in the dev. VM:
Fixes: #12228
Fixes: #12304
Fixes: #11894
/cc @brb