debug: disable ioctl(PIDFD_GET_INFO)#38724
Closed
keszybz wants to merge 1 commit intosystemd:mainfrom
Closed
Conversation
In https://bodhi.fedoraproject.org/updates/FEDORA-2025-a0ce059969 it was reported that the tests fail: > Rootless podman tests all show something like this eventually > > OCI runtime error: crun: join keyctl `7509a871d2ab7df6549f5cb5bd2d4daf990cc45c0022f116bd0882966ae53f30`: Disk quota exceeded > > Each container creates its own keyring but I assume they get leaked so at one > point we run our of available keyrings and all following tests fail like > that. Given I only see this on this update and from looking at the podman > tests logs it only starts happening after we run a bunch of our own systemd > services I wonder if systemd maybe leaks keyrings and thus it fails? After some very tediuos bisecting, I got the answer that dcf0ef3 is the first bad commit. This doesn't make much sense. I thought that maybe the answer is wrong somehow, or the fd we pass in has problems, but everything seems to work correctly. Both pidfd_get_pid_ioctl and pidfd_get_pid_fdinfo work fine and return the same answer. Nevertheless, skipping the call to pidfd_get_pid_ioctl makes the problem go away. bisection recipe: 1. compile systemd, systemd-executor, pam_systemd: $ ninja -C build systemd systemd-executor pam_systemd.so (Not all intermediate commits compile :) ) 2. use the compiled manager for the user running the tests: # /etc/systemd/system/user@1000.service.d/override.conf [Service] ExecStart= ExecStart=/home/fedora/src/systemd/build/systemd --user 3. install the new code: # cp ~fedora/src/systemd/build/pam_systemd.so /usr/lib64/security/ && systemctl restart user@1000 3. log out and log in again (via ssh) 4. run the test: $ grep -Ec '[a-f0-9]{64}: empty' /proc/keys && podman run -it fedora date && grep -Ec '[a-f0-9]{64}: empty' /proc/keys 17 Tue Aug 26 12:47:44 UTC 2025 18 It seems that both the pam module and the user manager somehow matter. This smells like a kernel bug or some strange race condition.
Member
Author
|
I forgot to add:
|
Member
|
Then maybe kernel bug?? |
Member
|
Hmm, might be fixed by torvalds/linux@0b2d71a ? |
Member
Author
|
With kernel-core-6.17.0-0.rc3.31.fc44.x86_64 the issue does not reproduce anymore. So this really seems to have been a kernel bug. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In https://bodhi.fedoraproject.org/updates/FEDORA-2025-a0ce059969 it was reported that the tests fail:
After some very tediuos bisecting, I got the answer that dcf0ef3 is the first bad commit. This doesn't make much sense. I thought that maybe the answer is wrong somehow, or the fd we pass in has problems, but everything seems to work correctly. Both pidfd_get_pid_ioctl and pidfd_get_pid_fdinfo work fine and return the same answer. Nevertheless, skipping the call to pidfd_get_pid_ioctl makes the problem go away.
bisection recipe:
It seems that both the pam module and the user manager somehow matter.
This smells like a kernel bug or some strange race condition.