v1.7 backports 2020-07-01 bpf NR_CPU fixes#12363
Conversation
[ upstream commit 62b1c86 ] getNumPossibleCPUs is needed for all per-cpu map and perf event arrays. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 8191b16 ] This uses `-D__NR_CPUS__=$(nproc --all)` (or `GetNumPossibleCPUs` when invoked from Go) to compile the datapath. This fixes an issue where cilium monitor fails to report any events on AKS, due to the `perf_event_array` map duplicates being created with different max_entries sizes, presumably causing the datapath to write to the first one, while the agent is reading from the second one. This bug occurs for example on AKS due to the present/possible cpuset on the VMs. The default Standard_D2s_v3 node size has 2 present CPUs, but 128 possible CPUs in /sys/devices/system/cpu. Fixes: #12070 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
|
test-backport-1.7 |
|
@gandro |
[ upstream commit 1bd4624 ] On Golang side we get the number of CPUs from /sys/devices/system/cpu/possible. On BPF side, we use $(nproc -all). nproc calls num_processors() from the gnulib. That function, however, may not always return the value from the /sys file above. Instead, we should use the exact same source as Golang side to ensure both sides have the same value and avoid issues later on. See #12070 for details. The __NR_CPUS__ values in test/bpf/Makefile and bpf/Makefile.bpf do not need to be in sync. with Golang values because these files are only used for unit tests, sparse, and compile testing. Fixes: 8191b16 ("bpf: Use `nproc --all` for __NR_CPUS__") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
4d5bba0 to
c60a1fe
Compare
|
test-backport-1.7 CI Failure, likely unrelated: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/19785/ Cilium-Ginkgo-Tests |
|
Agree that failure seems unrelated. Running CI in #12371 as well to see whether it's likely to be a reliable test failure on v1.7 branch or flake. There have been several failures in the past in this test and in latest/1.8 we reworked the test not to rely on managed etcd any more so most likely it's just flakiness from managed-etcd. But would be good to get an extra datapoint before pushing out a new release that hits this issue. |
|
#12371 did not fail the same test so seems like the known flake in the 1.7 CI card here: Merging. |
| include ../../Makefile.defs | ||
|
|
||
| FLAGS := -I../../bpf/ -I../../bpf/include -I. -D__NR_CPUS__=$(shell nproc) -O2 | ||
| FLAGS := -I../../bpf/ -I../../bpf/include -I. -D__NR_CPUS__=$(shell nproc -all) -O2 |
There was a problem hiding this comment.
This should be nproc --all, I think. Currently I see
make -C test/bpf/
nproc: invalid option -- 'a'
Try 'nproc --help' for more information.
on v1.7 builds. I don't think this is a big issue since it only affects the BPF tests, but will send a PR with a fix nevertheless.
There was a problem hiding this comment.
Oh, that seems to have been a typo I made while resolving a conflict. Thanks a lot for the fix!
When the __NR_CPUS__ definitions were change to use nproc --all in commit 6247cc5 ("bpf: Use `nproc --all` for __NR_CPUS__") via #12363 the change in test/bpf/Makefile was missing a `-`, leading to the following error message in the v1.8 build: make -C test/bpf/ nproc: invalid option -- 'a' Try 'nproc --help' for more information. While this is uncritical for the tests and the build, but fix it nevertheless to get rid of the error message. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
When the __NR_CPUS__ definitions were change to use nproc --all in commit 6247cc5 ("bpf: Use `nproc --all` for __NR_CPUS__") via #12363 the change in test/bpf/Makefile was missing a `-`, leading to the following error message in the v1.8 build: make -C test/bpf/ nproc: invalid option -- 'a' Try 'nproc --help' for more information. While this is uncritical for the tests and the build, but fix it nevertheless to get rid of the error message. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This PR backports the following PRs for the
__NR_CPU__fixes to Cilium v1.7.nproc --allfor __NR_CPUS__ #12121 -- bpf: Usenproc --allfor NR_CPUS (@gandro)I had to cherry-pick and backport commit 62b1c86 to be able to access the
GetNumPossibleCPUsfunction. It's the only commit backported from the parent PR #10626, thus the parent PR is intentionally not marked with any backporting labels.Once this PR is merged, you can update the PR labels via: