bpf: Use nproc --all for __NR_CPUS__#12121
Conversation
This uses `-D__NR_CPUS__=$(nproc --all)` (or `GetNumPossibleCPUs` when invoked from Go) to compile the datapath. This fixes an issue where cilium monitor fails to report any events on AKS, due to the `perf_event_array` map duplicates being created with different max_entries sizes, presumably causing the datapath to write to the first one, while the agent is reading from the second one. This bug occurs for example on AKS due to the present/possible cpuset on the VMs. The default Standard_D2s_v3 node size has 2 present CPUs, but 128 possible CPUs in /sys/devices/system/cpu. Fixes: #12070 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
|
test-me-please |
|
This is an alternative to #12119 - I have just validated that this fixes the issue on AKS - marking ready for review. |
|
Please check out #12070 (comment) for @pchaigno's great explanation of why this was not causing more troubles beforehand. |
|
There's two bugs fixed here in a way:
Do we want to backport this to v1.7 or even v1.6? |
|
(Given this is only events and signals map, this shouldn't have upgrade implications.) |
I believe that the core issue here where cilium doesn't report any flows is unique to v1.8 because v1.8 began opening (creating) this map prior to datapath provisioning. However if someone were to hotplug CPUs on v1.7 or earlier, they could plausibly also hit this. The fix itself looks pretty harmless, v1.7 backport is reasonable to me. |
|
The |
This uses
-D__NR_CPUS__=$(nproc --all)(orGetNumPossibleCPUswheninvoked from Go) to compile the datapath.
This fixes an issue where
cilium monitorfails to report any eventson AKS, due to the
perf_event_arraymap duplicates being createdwith different
max_entriessizes, presumably causing the datapathto write to the first one, while the agent is reading from the second
one.
This bug occurs for example on AKS due to the present/possible cpuset on
the VMs. The default Standard_D2s_v3 node size has 2 present CPUs, but
128 possible CPUs in
/sys/devices/system/cpu.Fixes: #12070