Skip to content

cleanup: transition from stopping to exited#15090

Merged
openshift-merge-robot merged 1 commit intocontainers:mainfrom
vrothberg:fix-14859
Jul 28, 2022
Merged

cleanup: transition from stopping to exited#15090
openshift-merge-robot merged 1 commit intocontainers:mainfrom
vrothberg:fix-14859

Conversation

@vrothberg
Copy link
Copy Markdown
Member

@vrothberg vrothberg commented Jul 27, 2022

Allow the cleanup process (and others) to transition the container from
stopping to exited. This fixes a race condition detected in #14859
where the cleanup process kicks in before the stopping process can
read the exit file. Prior to this fix, the cleanup process left the
container in the stopping state and removed the conmon files, such
that the stopping process also left the container in this state as it
could not read the exit files. Hence, podman wait timed out (see the
23 seconds execution time of the test [1]) due to the unexpected/invalid
state and the test failed.

Further turn the warning during stop to a debug message since it's a
natural race due to the daemonless/concurrent architecture and nothing
to worry about.

[NO NEW TESTS NEEDED] since we can only monitor if #14859 continues
flaking or not.

[1] https://storage.googleapis.com/cirrus-ci-6707778565701632-fcae48/artifacts/containers/podman/6210434704343040/html/sys-remote-fedora-36-rootless-host.log.html#t--00205

Fixes: #14859
Signed-off-by: Valentin Rothberg vrothberg@redhat.com

Does this PR introduce a user-facing change?

None

No release note as it's fixing an unreleased bug.

@mheon @edsantiago PTAL

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Jul 27, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 27, 2022
@edsantiago
Copy link
Copy Markdown
Member

Thank you!

@mheon
Copy link
Copy Markdown
Member

mheon commented Jul 27, 2022

This looks... Questionable. Shouldn't the syncContainer at the beginning of cleanup pick up the state transition to Stopped? The cleanup process firing means Conmon thinks the container is dead, so why isn't Podman detecting that?

@vrothberg
Copy link
Copy Markdown
Member Author

This looks... Questionable.

The analysis or the solution?

Shouldn't the syncContainer at the beginning of cleanup pick up the state transition to Stopped? The cleanup process firing means Conmon thinks the container is dead, so why isn't Podman detecting that?

Currently, it doesn't as it's checking for ...

 if c.ensureState(define.ContainerStateCreated, define.ContainerStateRunning, define.ContainerStateStopped, define.ContainerStatePaused) {

I considered that as well but am totally fine changing.

@mheon
Copy link
Copy Markdown
Member

mheon commented Jul 27, 2022

The solution - it seems like we're papering over the fact that syncContainer cannot move a Stopping container to Stopped. Given that, I think that c.ensureState there is the proper fix - we use syncContainer everywhere, so this should resolve a lot more potential issues

@vrothberg
Copy link
Copy Markdown
Member Author

The solution - it seems like we're papering over the fact that syncContainer cannot move a Stopping container to Stopped. Given that, I think that c.ensureState there is the proper fix - we use syncContainer everywhere, so this should resolve a lot more potential issues

Sounds good to me. I'll update the PR.

Allow the cleanup process (and others) to transition the container from
`stopping` to `exited`.  This fixes a race condition detected in containers#14859
where the cleanup process kicks in _before_ the stopping process can
read the exit file.  Prior to this fix, the cleanup process left the
container in the `stopping` state and removed the conmon files, such
that the stopping process also left the container in this state as it
could not read the exit files.  Hence, `podman wait` timed out (see the
23 seconds execution time of the test [1]) due to the unexpected/invalid
state and the test failed.

Further turn the warning during stop to a debug message since it's a
natural race due to the daemonless/concurrent architecture and nothing
to worry about.

[NO NEW TESTS NEEDED] since we can only monitor if containers#14859 continues
flaking or not.

[1] https://storage.googleapis.com/cirrus-ci-6707778565701632-fcae48/artifacts/containers/podman/6210434704343040/html/sys-remote-fedora-36-rootless-host.log.html#t--00205

Fixes: containers#14859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@mheon
Copy link
Copy Markdown
Member

mheon commented Jul 27, 2022

LGTM

@vrothberg
Copy link
Copy Markdown
Member Author

@containers/podman-maintainers PTAL

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Jul 27, 2022

/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 27, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 27, 2022
@vrothberg
Copy link
Copy Markdown
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 28, 2022
@openshift-merge-robot openshift-merge-robot merged commit c0d9ecd into containers:main Jul 28, 2022
@vrothberg vrothberg deleted the fix-14859 branch July 28, 2022 08:56
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none

Projects

None yet

Development

Successfully merging this pull request may close these issues.

podman-remote wait: Error getting exit code from DB: no such exit code

5 participants