Skip to content

daemon: Skip devices without hardware address during device detection#12321

Merged
joestringer merged 1 commit intomasterfrom
pr/pchaigno/device-detection-skip-no-hw-addr
Jun 29, 2020
Merged

daemon: Skip devices without hardware address during device detection#12321
joestringer merged 1 commit intomasterfrom
pr/pchaigno/device-detection-skip-no-hw-addr

Conversation

@pchaigno
Copy link
Copy Markdown
Member

@pchaigno pchaigno commented Jun 29, 2020

We need NodePort and direct routing devices to have a MAC address. If they don't, init.sh fails with the following error:

level=warning msg="+ for NATIVE_DEV in ${NATIVE_DEVS//;/ }" subsys=datapath-loader
level=warning msg="++ cat /sys/class/net/lo/ifindex" subsys=datapath-loader
level=warning msg="+ IDX=1" subsys=datapath-loader
level=warning msg="++ ip link show lo" subsys=datapath-loader
level=warning msg="++ grep ether" subsys=datapath-loader
level=warning msg="++ awk '{print $2}'" subsys=datapath-loader
level=warning msg="+ MAC=" subsys=datapath-loader
level=error msg="Error while initializing daemon" error="exit status 1" subsys=daemon
level=fatal msg="Error while creating daemon" error="exit status 1" subsys=daemon

Thus, we need to skip auto-detected devices that don't have a MAC address. This commit implements that and was tested by injecting a loopback interface with an IP address in the code, in the dev. VM:

loAddr, err := netlink.ParseAddr("192.168.33.11/32")
if err == nil {
    loAddr.LinkIndex = 1
    addrs = append(addrs, *loAddr)
}

Fixes: #12228
Fixes: #12304
Fixes: #11894
/cc @brb

Fix failure to start agent when detected devices don't have hardware addresses

We need NodePort and direct routing devices to have a MAC address. If
they don't, init.sh fails with the following error:

    level=warning msg="+ for NATIVE_DEV in ${NATIVE_DEVS//;/ }" subsys=datapath-loader
    level=warning msg="++ cat /sys/class/net/lo/ifindex" subsys=datapath-loader
    level=warning msg="+ IDX=1" subsys=datapath-loader
    level=warning msg="++ ip link show lo" subsys=datapath-loader
    level=warning msg="++ grep ether" subsys=datapath-loader
    level=warning msg="++ awk '{print $2}'" subsys=datapath-loader
    level=warning msg="+ MAC=" subsys=datapath-loader
    level=error msg="Error while initializing daemon" error="exit status 1" subsys=daemon
    level=fatal msg="Error while creating daemon" error="exit status 1" subsys=daemon

Thus, we need to skip auto-detected devices that don't have a MAC
address. This commit implements that and was tested by injecting a
loopback interface with an IP address in the code, in the dev. VM:

    loAddr, err := netlink.ParseAddr("192.168.33.11/32")
    if err == nil {
        loAddr.LinkIndex = 1
        addrs = append(addrs, *loAddr)
    }

Fixes: #12228
Fixes: #12304
Fixes: 6730d0f ("daemon: Extend BPF NodePort device auto-detection")
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno added kind/bug This is a bug in the Cilium logic. area/loader Impacts the loading of BPF programs into the kernel. area/daemon Impacts operation of the Cilium daemon. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.8 labels Jun 29, 2020
@pchaigno pchaigno requested review from a team and borkmann June 29, 2020 13:34
@pchaigno pchaigno changed the title daemon: Skip devices without hw address during device detection daemon: Skip devices without hardware address during device detection Jun 29, 2020
@pchaigno
Copy link
Copy Markdown
Member Author

test-me-please

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.005%) to 36.94% when pulling 29f5e31 on pr/pchaigno/device-detection-skip-no-hw-addr into 48f8e79 on master.

@joestringer
Copy link
Copy Markdown
Member

Were you able to reproduce this locally to validate the fix?

Copy link
Copy Markdown
Member

@borkmann borkmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we can revisit adding support by adding a all-zero HW address later (plus checking that redirect does the right thing).

@maintainer-s-little-helper maintainer-s-little-helper Bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 29, 2020
@pchaigno
Copy link
Copy Markdown
Member Author

Were you able to reproduce this locally to validate the fix?

Yes, I reproduced by adding a bit of code to inject a loopback device with an IP address such that it would be selected by the device detection (see tested by injecting a loopback interface with an IP address in the code, in the dev. VM in OP). I got the same error as reported by users. With this fix applied, the loopback device is excluded from detection, enp0s8 selected instead, and there is no error.

If preferred, I think I would be able to reproduce without adding code now that I understand the different steps of the device detection (I need to set a specific IP address to an interface with an index higher than enp0s8 I think; otherwise it's overwritten).

LGTM, we can revisit adding support by adding a all-zero HW address later (plus checking that redirect does the right thing).

One thing that's important to note here is that I expect Cilium will still fail to start if a user explicitly configures a device without a HW address. This PR only fixes the detection.

My rationale is that excluding devices without HW addresses is a good way to avoid corner cases. If a user purposely wants to use a device without a HW address (common case is WireGuard), they should set it explicitly and we will need to provide a proper fix such as the all-zero HW address you mention.

There's a bit more work required for that fix (in particular, need to reproduce and maybe document) and I wanted to get the device-detection fix out quickly since a lot of users seem to be hitting that. Maybe I should also exclude devices explicitly set by users if they don't have a HW address now? Unless we expect to have a fix for that soon and it's not worth it?

@joestringer
Copy link
Copy Markdown
Member

I agree with fixing the auto-detection for most users where they're just incidentally hitting this without explicitly specifying devices, hence why I'm happy to get this in as-is.

In a lot of cases today, Cilium will fail out early to help signal to users that the configuration is wrong. In this case, I think the detection is actually too late and the log messages uninterpretable so at the minimum it'd be nice to add such a check to the --devices side of this to explicitly check and fail out with a clear message. It could certainly be argued that we could instead warn loudly in the logs and push through without configuring on such devices, I guess it depends on how explicitly we expect users to be specifying such devices.

@joestringer joestringer merged commit 089060b into master Jun 29, 2020
@joestringer joestringer deleted the pr/pchaigno/device-detection-skip-no-hw-addr branch June 29, 2020 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Impacts operation of the Cilium daemon. area/loader Impacts the loading of BPF programs into the kernel. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error while creating daemon when NodePort device is TUN or WireGuard interface Failure on first import

5 participants