debug: disable ioctl(PIDFD_GET_INFO) by keszybz · Pull Request #38724 · systemd/systemd

keszybz · 2025-08-26T13:13:00Z

In https://bodhi.fedoraproject.org/updates/FEDORA-2025-a0ce059969 it was reported that the tests fail:

Rootless podman tests all show something like this eventually

OCI runtime error: crun: join keyctl 7509a871d2ab7df6549f5cb5bd2d4daf990cc45c0022f116bd0882966ae53f30: Disk quota exceeded

Each container creates its own keyring but I assume they get leaked so at one
point we run our of available keyrings and all following tests fail like
that. Given I only see this on this update and from looking at the podman
tests logs it only starts happening after we run a bunch of our own systemd
services I wonder if systemd maybe leaks keyrings and thus it fails?

After some very tediuos bisecting, I got the answer that dcf0ef3 is the first bad commit. This doesn't make much sense. I thought that maybe the answer is wrong somehow, or the fd we pass in has problems, but everything seems to work correctly. Both pidfd_get_pid_ioctl and pidfd_get_pid_fdinfo work fine and return the same answer. Nevertheless, skipping the call to pidfd_get_pid_ioctl makes the problem go away.

bisection recipe:

compile systemd, systemd-executor, pam_systemd: $ ninja -C build systemd systemd-executor pam_systemd.so (Not all intermediate commits compile :) )
use the compiled manager for the user running the tests: # /etc/systemd/system/user@1000.service.d/override.conf [Service] ExecStart= ExecStart=/home/fedora/src/systemd/build/systemd --user
install the new code: # cp ~fedora/src/systemd/build/pam_systemd.so /usr/lib64/security/ && systemctl restart user@1000
log out and log in again (via ssh)
run the test: $ grep -Ec '[a-f0-9]{64}: empty' /proc/keys && podman run -it fedora date && grep -Ec '[a-f0-9]{64}: empty' /proc/keys 17 Tue Aug 26 12:47:44 UTC 2025 18

It seems that both the pam module and the user manager somehow matter.

This smells like a kernel bug or some strange race condition.

In https://bodhi.fedoraproject.org/updates/FEDORA-2025-a0ce059969 it was reported that the tests fail: > Rootless podman tests all show something like this eventually > > OCI runtime error: crun: join keyctl `7509a871d2ab7df6549f5cb5bd2d4daf990cc45c0022f116bd0882966ae53f30`: Disk quota exceeded > > Each container creates its own keyring but I assume they get leaked so at one > point we run our of available keyrings and all following tests fail like > that. Given I only see this on this update and from looking at the podman > tests logs it only starts happening after we run a bunch of our own systemd > services I wonder if systemd maybe leaks keyrings and thus it fails? After some very tediuos bisecting, I got the answer that dcf0ef3 is the first bad commit. This doesn't make much sense. I thought that maybe the answer is wrong somehow, or the fd we pass in has problems, but everything seems to work correctly. Both pidfd_get_pid_ioctl and pidfd_get_pid_fdinfo work fine and return the same answer. Nevertheless, skipping the call to pidfd_get_pid_ioctl makes the problem go away. bisection recipe: 1. compile systemd, systemd-executor, pam_systemd: $ ninja -C build systemd systemd-executor pam_systemd.so (Not all intermediate commits compile :) ) 2. use the compiled manager for the user running the tests: # /etc/systemd/system/user@1000.service.d/override.conf [Service] ExecStart= ExecStart=/home/fedora/src/systemd/build/systemd --user 3. install the new code: # cp ~fedora/src/systemd/build/pam_systemd.so /usr/lib64/security/ && systemctl restart user@1000 3. log out and log in again (via ssh) 4. run the test: $ grep -Ec '[a-f0-9]{64}: empty' /proc/keys && podman run -it fedora date && grep -Ec '[a-f0-9]{64}: empty' /proc/keys 17 Tue Aug 26 12:47:44 UTC 2025 18 It seems that both the pam module and the user manager somehow matter. This smells like a kernel bug or some strange race condition.

keszybz · 2025-08-26T13:17:22Z

I forgot to add:

in my VM: kernel-core-6.15.0-0.rc5.250509g9c69f8884904.47.fc43.x86_64
in the CI infra: 6.17.0-0.rc1.17.fc43.aarch64

yuwata · 2025-08-26T13:38:08Z

Then maybe kernel bug??

YHNdnzj · 2025-08-26T14:07:43Z

Hmm, might be fixed by torvalds/linux@0b2d71a ?

keszybz · 2025-09-01T10:07:18Z

With kernel-core-6.17.0-0.rc3.31.fc44.x86_64 the issue does not reproduce anymore. So this really seems to have been a kernel bug.

keszybz requested a review from YHNdnzj August 26, 2025 13:13

github-actions bot added the util-lib label Aug 26, 2025

keszybz added do-not-merge 💣 and removed util-lib labels Aug 26, 2025

keszybz added the kernel-bug label Sep 1, 2025

keszybz closed this Sep 1, 2025

Luap99 mentioned this pull request Sep 9, 2025

New VM images 2025-09-10 containers/podman#27030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

debug: disable ioctl(PIDFD_GET_INFO)#38724

debug: disable ioctl(PIDFD_GET_INFO)#38724
keszybz wants to merge 1 commit intosystemd:mainfrom
keszybz:podman-debug

keszybz commented Aug 26, 2025

Uh oh!

keszybz commented Aug 26, 2025 •

edited

Loading

Uh oh!

yuwata commented Aug 26, 2025

Uh oh!

YHNdnzj commented Aug 26, 2025

Uh oh!

keszybz commented Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

keszybz commented Aug 26, 2025

Uh oh!

keszybz commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuwata commented Aug 26, 2025

Uh oh!

YHNdnzj commented Aug 26, 2025

Uh oh!

keszybz commented Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

keszybz commented Aug 26, 2025 •

edited

Loading