Skip to content

bpf: Share __NR_CPUS__ value with Golang side#12310

Merged
pchaigno merged 1 commit intomasterfrom
pr/pchaigno/fix-replace-nproc
Jun 29, 2020
Merged

bpf: Share __NR_CPUS__ value with Golang side#12310
pchaigno merged 1 commit intomasterfrom
pr/pchaigno/fix-replace-nproc

Conversation

@pchaigno
Copy link
Copy Markdown
Member

@pchaigno pchaigno commented Jun 26, 2020

On Golang side we get the number of CPUs from /sys/devices/system/cpu/possible. On BPF side, we use $(nproc -all). nproc calls num_processors() from the gnulib. That function, however, may not always return the value from the /sys file above.

Instead, we should use the exact same source as Golang side to ensure both sides have the same value and avoid issues later on. See #12070 for details.

The __NR_CPUS__ values in test/bpf/Makefile and bpf/Makefile.bpf do not need to be in sync. with Golang values because these files are only used for unit tests, sparse, and compile testing.

Fixes: #12121
/cc @gandro

Fix silent cilium monitor on systems with offline CPUs

@pchaigno pchaigno added kind/bug This is a bug in the Cilium logic. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.7 labels Jun 26, 2020
@pchaigno pchaigno requested review from a team and gandro June 26, 2020 21:14
@pchaigno pchaigno requested review from a team as code owners June 26, 2020 21:14
Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but according to the docs /sys/devices/system/cpu/possible will produce a range of possible IDs (e.g. N-M) instead of a number. How can this be used as a value for the number of CPUs?

@pchaigno
Copy link
Copy Markdown
Member Author

@gandro You're not missing something. I shouldn't send patches after 10pm on Fridays 🤦

@pchaigno pchaigno force-pushed the pr/pchaigno/fix-replace-nproc branch from 1492405 to 21f778d Compare June 27, 2020 13:09
@pchaigno pchaigno requested a review from gandro June 27, 2020 13:09
@pchaigno pchaigno force-pushed the pr/pchaigno/fix-replace-nproc branch from 21f778d to 8be1ed3 Compare June 27, 2020 14:05
@coveralls
Copy link
Copy Markdown

coveralls commented Jun 27, 2020

Coverage Status

Coverage decreased (-0.02%) to 36.908% when pulling 69f06e0 on pr/pchaigno/fix-replace-nproc into 54d2175 on master.

Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it was a bit unclear to me, when nproc --all would be wrong in the first place, it did some testing and it seems like the main problem is that nproc --all does not deal with SMT, at least on my laptop it does not agree with possible. This is indeed something my original PR did not consider.

In any case, we probably should really make sure we use exactly the same parsing mechanism in Golang and in the Makefiles to avoid further mistakes. Maybe have a shell script that parses possible the same way we do it in Golang?

Comment thread bpf/Makefile.bpf Outdated
@pchaigno pchaigno marked this pull request as draft June 27, 2020 15:51
On Golang side we get the number of CPUs from
/sys/devices/system/cpu/possible. On BPF side, we use $(nproc -all).
nproc calls num_processors() from the gnulib. That function, however, may
not always return the value from the /sys file above.

Instead, we should use the exact same source as Golang side to ensure
both sides have the same value and avoid issues later on. See #12070
for details.

The __NR_CPUS__ values in test/bpf/Makefile and bpf/Makefile.bpf do not
need to be in sync. with Golang values because these files are only used
for unit tests, sparse, and compile testing.

Fixes: 8191b16 ("bpf: Use `nproc --all` for __NR_CPUS__")
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno force-pushed the pr/pchaigno/fix-replace-nproc branch from 8be1ed3 to 69f06e0 Compare June 29, 2020 07:23
@pchaigno pchaigno changed the title bpf: Use same file as Golang side instead of nproc bpf: Share __NR_CPUS__ value with Golang side Jun 29, 2020
@pchaigno
Copy link
Copy Markdown
Member Author

test-me-please

@pchaigno
Copy link
Copy Markdown
Member Author

I was able to reproduce the problem with nproc --all on the dev. VM using cmdline arguments. I checked that cilium monitor worked and there were only two perf array maps on the system with this pull request.

$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-5.8.0-rc1+ root=/dev/mapper/vagrant--vg-root ro quiet possible_cpus=3
$ cat /sys/devices/system/cpu/present 
0-7
$ cat /sys/devices/system/cpu/possible 
0-31
$ nproc --all
8
$ nproc
8

@pchaigno pchaigno requested a review from gandro June 29, 2020 11:28
@pchaigno pchaigno marked this pull request as ready for review June 29, 2020 11:28
Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with the BPF build process. We are still using nproc --all in the BPF Makefiles. Are these not used by the datapath and only used for testing?

@pchaigno
Copy link
Copy Markdown
Member Author

I'm not too familiar with the BPF build process. We are still using nproc --all in the BPF Makefiles. Are these not used by the datapath and only used for testing?

@gandro As far as I'm aware, yes, we only use them for testing. All BPF programs loaded by init.sh go through bpf_compile() (which has the right value). BPF programs loaded from Golang side have their own compile command:

standardCFlags = []string{"-O2", "-target", "bpf", "-std=gnu89",
"-nostdinc", fmt.Sprintf("-D__NR_CPUS__=%d", common.GetNumPossibleCPUs(log)),

@pchaigno pchaigno requested review from a team and removed request for a team June 29, 2020 11:41
@maintainer-s-little-helper maintainer-s-little-helper Bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 29, 2020
@pchaigno pchaigno merged commit 1bd4624 into master Jun 29, 2020
@pchaigno pchaigno deleted the pr/pchaigno/fix-replace-nproc branch June 29, 2020 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants