Currently, when installing Cilium with XDP NodePort acceleration on EKS (Amazon Linux 2 with kernel-ng installed, i.e. 5.4.38-17.76.amzn2.x86_64 on instance type m5.xlarge), the cilium-agent fails with:
level=warning msg="+ ip -force link set dev eth0 xdpdrv obj bpf_xdp.o sec from-netdev" subsys=datapath-loader
level=warning msg="Error: ena: Failed to set xdp program, there is no enough space for allocating XDP queues, Check the dmesg for more info." subsys=datapath-loader
level=warning msg="+ RETCODE=2" subsys=datapath-loader
level=warning msg="+ set -e" subsys=datapath-loader
level=warning msg="+ cilium-map-migrate -e bpf_xdp.o -r 2" subsys=datapath-loader
level=warning msg="+ return 2" subsys=datapath-loader
level=error msg="Error while initializing daemon" error="exit status 2" subsys=daemon
level=fatal msg="Error while creating daemon" error="exit status 2" subsys=daemon
Looking at dmesg we see:
[ 642.068486] ena 0000:00:05.0 eth0: Failed to set xdp program, the Rx/Tx channel count should be at most half of the maximum allowed channel count. The current queue count (4), the maximal queue count (4)
The message is coming from here:
https://github.com/amzn/amzn-drivers/blob/ec8285e834332fc00b12882e2ca44d26e5dffd91/kernel/linux/ena/ena_netdev.c#L600-L607
ena_xdp_allowed is defined as:
https://github.com/amzn/amzn-drivers/blob/ef91653e7725178f43715dbb3a64573420d79a63/kernel/linux/ena/ena_netdev.h#L600-L616
When trying to set the number of queues using ethtool -L eth0 combined 2, the node becomes unreachable after a few seconds.
See https://gist.github.com/tklauser/268641aea4fced9e8c3b4a2f7536661b for the notes kept so far while investigating this.
/cc @borkmann
Currently, when installing Cilium with XDP NodePort acceleration on EKS (Amazon Linux 2 with
kernel-nginstalled, i.e.5.4.38-17.76.amzn2.x86_64on instance typem5.xlarge), the cilium-agent fails with:Looking at
dmesgwe see:The message is coming from here:
https://github.com/amzn/amzn-drivers/blob/ec8285e834332fc00b12882e2ca44d26e5dffd91/kernel/linux/ena/ena_netdev.c#L600-L607
ena_xdp_allowedis defined as:https://github.com/amzn/amzn-drivers/blob/ef91653e7725178f43715dbb3a64573420d79a63/kernel/linux/ena/ena_netdev.h#L600-L616
When trying to set the number of queues using
ethtool -L eth0 combined 2, the node becomes unreachable after a few seconds.See https://gist.github.com/tklauser/268641aea4fced9e8c3b4a2f7536661b for the notes kept so far while investigating this.
/cc @borkmann