Fix netns leak on container creation and exit code 1 on SIGTERM.#24082
Fix netns leak on container creation and exit code 1 on SIGTERM.#24082openshift-merge-bot[bot] merged 3 commits intocontainers:mainfrom
Conversation
When we are killed during netns setup it will leak the netns path as it was not commited in the db. This is rather common if you run systemctl stop on a podman systemd unit. Of course we cannot protect against SIGKILL but in systemd case we get SIGTERM and we really should not exit in a critical section like this. Fixes containers#24044 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Currently podman run -d can exit 0 if we send SIGTERM during startup even though the contianer was never started. That just doesn't make any sense is horribly confusing for a external job manager like systemd. The original motivation was to exit 0 for the podman.service in commit ca7376b. That does make sense but it should only do so for the service and only if the server did indeed gracefully shutdown. So we rework how the exit logic works, do not let the handler perform the exit. Instead the shutdown package does the exit after all handlers are run, this solves the issue of ordering. Then we default to exit code 1 like we did before and allow the service exit handler to overwrite the exit code 0 in case of a graceful shutdown. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
It is never used and needed so let's just remove some dead code. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
|
@mheon @edsantiago PTAL |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Ephemeral COPR build failed. @containers/packit-build please check. |
|
LGTM, and I can no longer reproduce the netns leak |
| } | ||
| handlerLock.Unlock() | ||
| shutdownInhibit.Unlock() | ||
| os.Exit(exitCode) |
There was a problem hiding this comment.
Wouldn't this prevent a lot of defer functions from running? I know I avoided it deliberately when I wrote this
There was a problem hiding this comment.
I mean yes but if we exit what do we want to defer here? And most importantly the current handler also just exited so there should not be any functional difference I would say
There was a problem hiding this comment.
IIRC we have a bunch of things running in defer that do things like removing files to clean up after ourselves
There was a problem hiding this comment.
Sure but this PR doesn't change the behavior here. Critical sections where we cannot leak need to use shutdown.Inhibit() which is how I fixed the issue with the netns leak.
|
LGTM, but defereing for merge to @mheon given his questions ... |
|
Saw the netns leak flake today, almost wept in despair, then remembered that this PR hasn't merged yet. |
|
/lgtm |
The old code did use exit() there is no functional change in that regard. |
|
LGTM |
libpod: ensure we are not killed during netns creation
When we are killed during netns setup it will leak the netns path as it
was not commited in the db. This is rather common if you run systemctl
stop on a podman systemd unit. Of course we cannot protect against
SIGKILL but in systemd case we get SIGTERM and we really should not exit
in a critical section like this.
Fixes #24044
libpod: rework shutdown handler flow
Currently podman run -d can exit 0 if we send SIGTERM during startup
even though the contianer was never started. That just doesn't make any
sense is horribly confusing for a external job manager like systemd.
The original motivation was to exit 0 for the podman.service in commit
ca7376b. That does make sense but it should only do so for the
service and only if the server did indeed gracefully shutdown.
So we rework how the exit logic works, do not let the handler perform
the exit. Instead the shutdown package does the exit after all handlers
are run, this solves the issue of ordering. Then we default to exit code
1 like we did before and allow the service exit handler to overwrite the
exit code 0 in case of a graceful shutdown.
libpod: remove shutdown.Unregister()
It is never used and needed so let's just remove some dead code.
Does this PR introduce a user-facing change?