Skip to content

Use number of possible CPUs for PerfEventArray.MaxEntries#81

Merged
tklauser merged 3 commits intomasterfrom
lmb/perf-enodev
Mar 17, 2020
Merged

Use number of possible CPUs for PerfEventArray.MaxEntries#81
tklauser merged 3 commits intomasterfrom
lmb/perf-enodev

Conversation

@lmb
Copy link
Copy Markdown
Contributor

@lmb lmb commented Mar 17, 2020

Based on discussion in #58 and prompted by @iAklis and his PRs #78 and #80.

This fixes the behaviour on systems that have one or more CPUs disabled. Previously things would only work if the disabled CPUs were on the end of the range.

It's still not possible to use PerfEventArrays that have more than the total number of CPUs in the system (as suggested by @iAklis), since I'm not sure that's a good idea to do. We also don't deal with CPUs being added or removed after a perf.Reader has been created.

More details in the commit descriptions.

lmb added 3 commits March 17, 2020 16:25
We currently use the highest possible CPU to determine the
size of a PerfEventArray. This has several problems: first,
the code to parse the online CPUs doesn't correctly handle
ranges: "0,2-7" is treated like there is only one online CPU.
This can happen when a CPU is disabled at runtime using

    echo 0 | sudo tee /sys/devices/system/cpu/cpu1/online

We silently create a map with an incorrect size, and user
code starts to fail with E2BIG.

Fix this by using the number of possible CPUs as the map size.
The number can only be changed by a reboot and so is safe to use.
It's also simpler to use, since we don't have to deal with
multiple ranges like in the online CPU case.
For some reason we call Resume twice when creating a new reader.
Also switch from explicitly enumerating pauseFds to using range.
PerfEventArrays are now sized to the possible CPUs in the system.
This means we may try creating a ring buffer for an offline CPU.
In this case, perf_event_open helpfully returns ENODEV, which we
can handle gracefully.

Note that the reader won't work correctly if CPUs are added after
it has been initialized. Systems that only have some of their
CPU sockets populated will work however.
@lmb lmb requested a review from tklauser March 17, 2020 17:32
Copy link
Copy Markdown
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tklauser tklauser merged commit 6f632cc into master Mar 17, 2020
@tklauser tklauser deleted the lmb/perf-enodev branch March 17, 2020 18:27
tklauser added a commit to cilium/cilium that referenced this pull request Jun 15, 2020
This pulls in cilium/ebpf#81 which fixes a crash when trying to
initialize BPF per ring buffers for offline CPUs:

  level=fatal msg="Cannot initialise BPF perf ring buffer sockets" error="failed to create perf ring for CPU 2: can't create perf event: can't create perf event: no such device" startTime="2020-06-15 12:15:09.153912253 +0000 UTC m=+129.850487215" subsys=monitor-agent

Reported-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
borkmann pushed a commit to cilium/cilium that referenced this pull request Jun 16, 2020
This pulls in cilium/ebpf#81 which fixes a crash when trying to
initialize BPF per ring buffers for offline CPUs:

  level=fatal msg="Cannot initialise BPF perf ring buffer sockets" error="failed to create perf ring for CPU 2: can't create perf event: can't create perf event: no such device" startTime="2020-06-15 12:15:09.153912253 +0000 UTC m=+129.850487215" subsys=monitor-agent

Reported-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
borkmann pushed a commit to cilium/cilium that referenced this pull request Jun 16, 2020
[ upstream commit 00ef71b ]

This pulls in cilium/ebpf#81 which fixes a crash when trying to
initialize BPF per ring buffers for offline CPUs:

  level=fatal msg="Cannot initialise BPF perf ring buffer sockets" error="failed to create perf ring for CPU 2: can't create perf event: can't create perf event: no such device" startTime="2020-06-15 12:15:09.153912253 +0000 UTC m=+129.850487215" subsys=monitor-agent

Reported-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
borkmann pushed a commit to cilium/cilium that referenced this pull request Jun 16, 2020
[ upstream commit 00ef71b ]

This pulls in cilium/ebpf#81 which fixes a crash when trying to
initialize BPF per ring buffers for offline CPUs:

  level=fatal msg="Cannot initialise BPF perf ring buffer sockets" error="failed to create perf ring for CPU 2: can't create perf event: can't create perf event: no such device" startTime="2020-06-15 12:15:09.153912253 +0000 UTC m=+129.850487215" subsys=monitor-agent

Reported-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
gandro added a commit to cilium/cilium that referenced this pull request Jul 8, 2020
[ upstream commit 00ef71b ]

This pulls in cilium/ebpf#81 which fixes a crash when trying to
initialize BPF per ring buffers for offline CPUs:

  level=fatal msg="Cannot initialise BPF perf ring buffer sockets" error="failed to create perf ring for CPU 2: can't create perf event: can't create perf event: no such device" startTime="2020-06-15 12:15:09.153912253 +0000 UTC m=+129.850487215" subsys=monitor-agent

Reported-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
brb pushed a commit to cilium/cilium that referenced this pull request Jul 9, 2020
[ upstream commit 00ef71b ]

This pulls in cilium/ebpf#81 which fixes a crash when trying to
initialize BPF per ring buffers for offline CPUs:

  level=fatal msg="Cannot initialise BPF perf ring buffer sockets" error="failed to create perf ring for CPU 2: can't create perf event: can't create perf event: no such device" startTime="2020-06-15 12:15:09.153912253 +0000 UTC m=+129.850487215" subsys=monitor-agent

Reported-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants