Skip to content

Fix container exit detection when process is not a child of conmon in systemd scope environments#571

Merged
jnovy merged 1 commit intomainfrom
545
Aug 7, 2025
Merged

Fix container exit detection when process is not a child of conmon in systemd scope environments#571
jnovy merged 1 commit intomainfrom
545

Conversation

@jnovy
Copy link
Collaborator

@jnovy jnovy commented Jul 30, 2025

When container processes are not direct children of conmon conmon fails to detect container exits because it never receives SIGCHLD signals.

This fixes issue where conmon processes remain running after container exit in certain systemd cgroup manager configurations.

Fixes: #545

@jnovy jnovy requested review from giuseppe and haircommander July 30, 2025 09:00
@jnovy jnovy added the jira label Jul 30, 2025
@giuseppe
Copy link
Member

how would this happen?

conmon is setting set_subreaper() so that any child process, even not direct, is reported to conmon

@jnovy
Copy link
Collaborator Author

jnovy commented Jul 30, 2025

It seems subreaper handles the normal case where process re-parenting works as expected. The problem is that in some systemd configurations, even with subreaper set container processes may not become direct children of conmon.

@giuseppe
Copy link
Member

It seems subreaper handles the normal case where process re-parenting works as expected. The problem is that in some systemd configurations, even with subreaper set container processes may not become direct children of conmon.

how would that happen? conmon runs the OCI runtime that in turns runs the container. Every container process is a descendant of conmon

@giuseppe
Copy link
Member

does it break when there are exec sessions?

@jnovy
Copy link
Collaborator Author

jnovy commented Jul 30, 2025

Good catch! Let me add some bits to handle this.

@jnovy
Copy link
Collaborator Author

jnovy commented Jul 30, 2025

@giuseppe Is it better now? Or is there anything else I missed?

@giuseppe
Copy link
Member

I am not really a fan of the probe part, can we understand how that configuration happens? Because conmon should get the signal, even with exec sessions (the container exits, all the processes in the pid namespace are killed and their parent gets the sigchld)

@jnovy
Copy link
Collaborator Author

jnovy commented Jul 30, 2025

Normally this would be the expected state:

systemd
  - crio-conmon-xxx.scope
    - conmon (subreaper)
      - OCI runtime (create_pid)
        - container process

But what happens in the case described in the issue is:

systemd
  - crio-conmon-xxx.scope
    - conmon (subreaper - isolated from container)
      - OCI runtime (create_pid, exits normally)
  - crio-xxx.scope
    - container process (moved by systemd, no longer child of conmon) 

So systemd moves the container process to the new scope after OCI runtime creates it but before OCI runtime exits. This seems to be why conmon receives the runtime exit signal (from runtime) but misses the container exit signal.

Does it make it any more clear?

@giuseppe
Copy link
Member

how does the cgroup affect the parent->child relationship? I don't think it should matter in what cgroup the process ends up, it should still be reported to the conmon (assuming set_reaper() was used correctly) used for the exec session.

You can verify it by running podman run ...., then do podman exec ... and find in the process list the exec'ed process:

$ podman exec -lti sleep 12345 &
$ grep -i ppid /proc/$(pgrep -f 12345 | tail -n1)/status
PPid:   380778

and PPid is the conmon process.

When container processes are not direct children of conmon conmon fails to
detect container exits because it never receives SIGCHLD signals.

This fixes issue where conmon processes remain running after container exit
in certain systemd cgroup manager configurations.

Fixes: #545

Signed-off-by: Jindrich Novy <jnovy@redhat.com>
@jnovy
Copy link
Collaborator Author

jnovy commented Aug 7, 2025

Ok, let's keep the fix minimal - I removed the probe and left only the main part. @giuseppe PTAL

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this extra check but I still don't understand how we could end up in this situation, unless the OCI runtime is misbehaving. Could we adjust the comment that mentions the process not being a child of conmon?

LGTM

@jnovy jnovy changed the title Fix container exit detection in systemd scope environments Fix container exit detection when process is not a child of conmon in systemd scope environments Aug 7, 2025
@jnovy
Copy link
Collaborator Author

jnovy commented Aug 7, 2025

Comment adjusted, thanks!

@jnovy jnovy merged commit d92af7a into main Aug 7, 2025
30 checks passed
@jnovy jnovy deleted the 545 branch August 7, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crio-conmon.scope still running despite crio-.scope not

2 participants