Skip to content

Add a warning if tmpdir/rundir are not on tmpfs#22141

Closed
mheon wants to merge 1 commit intocontainers:v5.0from
mheon:warn_on_not_tmpfs
Closed

Add a warning if tmpdir/rundir are not on tmpfs#22141
mheon wants to merge 1 commit intocontainers:v5.0from
mheon:warn_on_not_tmpfs

Conversation

@mheon
Copy link
Copy Markdown
Member

@mheon mheon commented Mar 22, 2024

Podman requires several directories to be on tmpfs so we can properly detect (and clean up after) reboots: the tmpdir (for Libpod files) and runroot (for c/storage files). We have not done any validation that they are actual on tmpfs filesystems in the past, which leads to no end of people wondering why Podman is broken on their system (where, perhaps, /tmp is not a tmpfs, which the FHS allows because sanity cannot exist in this world, and there is no systemd session so /run/user/$UID does not exist).

This adds an actual warning on launching Podman with such an unsupported configuration. There's an environment variable if the user is certain they want to run Podman like this (perhaps having used systemd-tmpfiles to automatically clean the relevant dir on reboot), but otherwise we warn every time a command is run with runroot/tmpdir on a non-tmpfs filesystem.

Does this PR introduce a user-facing change?

Podman now warns if a temporary files directory is not placed on a tmpfs filesystem.

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note labels Mar 22, 2024
@mheon mheon force-pushed the warn_on_not_tmpfs branch from d4ec473 to 811ecc3 Compare March 22, 2024 16:53
@afbjorklund
Copy link
Copy Markdown
Contributor

perhaps, /tmp is not a tmpfs, which the FHS allows because sanity cannot exist in this world, and there is no systemd session so /run/user/$UID does not exist)

I thought Podman still supported installations with a traditional /tmp filesystem and a traditional init system...

Copy link
Copy Markdown
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, all I can say is we should have done this much sooner given the amount of problems we had with this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define this as const

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there also the less common ramfs so I think we should check that here to.

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented Mar 22, 2024

perhaps, /tmp is not a tmpfs, which the FHS allows because sanity cannot exist in this world, and there is no systemd session so /run/user/$UID does not exist)

I thought Podman still supported installations with a traditional /tmp filesystem and a traditional init system...

We do, but that doesn't change the fact that we depend on tmpfs or some other util to remove these paths on boot.

@packit-as-a-service
Copy link
Copy Markdown

Cockpit tests failed for commit d4ec4737e083c84efecc531ce5bed4c088bbc03e. @martinpitt, @jelly, @mvollmer please check.

@packit-as-a-service
Copy link
Copy Markdown

Cockpit tests failed for commit 811ecc38dc3dca830b4ff3d86c53418c9f103a25. @martinpitt, @jelly, @mvollmer please check.

@mheon mheon force-pushed the warn_on_not_tmpfs branch from 811ecc3 to eac2675 Compare March 22, 2024 19:26
@mheon
Copy link
Copy Markdown
Member Author

mheon commented Mar 22, 2024

@edsantiago The system tests don't use put the test directory on tmpfs, right? If so we can test this on tmpfs

@baude
Copy link
Copy Markdown
Member

baude commented Mar 22, 2024

LGTM, do any docs need updates or should have clarification?

@edsantiago
Copy link
Copy Markdown
Member

@edsantiago The system tests don't use put the test directory on tmpfs, right? If so we can test this on tmpfs

@mheon I've been struggling to understand this. The best answer I can give is, (for the most part) system tests just invoke podman with no special magic args. There are a few very specific cases in which there are --root --runroot --tmpdir. Does that address your question?

@giuseppe
Copy link
Copy Markdown
Member

could we instead write a file to the tmpdir/rundir and with the value of /proc/sys/kernel/random/boot_id and compare it each time? If it has a different value, then we return an error instead of a warning.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to avoid the warning also when running in a container (the container env is set)

@mheon
Copy link
Copy Markdown
Member Author

mheon commented Mar 25, 2024

@giuseppe That'd work for generic reboot protection, but it wouldn't help clear files afterwards - this PR was motivated by the IPAM DB not being removed on reboot leading to us running out of IP addresses to allocate. We could theoretically build that entirely into Podman as well, have a list of files which must be removed if reboot is detected, but that seems overly complex when we can just make sure a tmpfs is in use?

@mheon
Copy link
Copy Markdown
Member Author

mheon commented Mar 25, 2024

@edsantiago Yeah that should be fine, thanks. Needed to make sure that we're using standard paths and not throwing everything on a tmpfs for performance or similar.

@giuseppe
Copy link
Copy Markdown
Member

giuseppe commented Mar 25, 2024

@giuseppe That'd work for generic reboot protection, but it wouldn't help clear files afterwards - this PR was motivated by the IPAM DB not being removed on reboot leading to us running out of IP addresses to allocate. We could theoretically build that entirely into Podman as well, have a list of files which must be removed if reboot is detected, but that seems overly complex when we can just make sure a tmpfs is in use?

I was thinking about raising an error and terminating the Podman process if the content of /proc/sys/kernel/random/boot_id is different than $RUNDIR/boot_id with something like the content of $FILE1 is different than $FILE2, please remove $RUNDIR and try again.

EDIT: would that be enough to protect from the IPAM DB not being cleaned up?

@mheon
Copy link
Copy Markdown
Member Author

mheon commented Mar 25, 2024

Hm. I like the idea, but I'd also like to backport this patch, and I don't know if introducing a new error in existing releases is a good idea. Maybe I should target this at 5.0 only, and we make it an error in main?

@giuseppe
Copy link
Copy Markdown
Member

Hm. I like the idea, but I'd also like to backport this patch, and I don't know if introducing a new error in existing releases is a good idea. Maybe I should target this at 5.0 only, and we make it an error in main?

sure, that works for me

@mheon mheon changed the base branch from main to v5.0 March 26, 2024 12:18
@mheon mheon force-pushed the warn_on_not_tmpfs branch from eac2675 to dddefdd Compare March 26, 2024 12:20
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 26, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mheon
Copy link
Copy Markdown
Member Author

mheon commented Mar 26, 2024

Retargeted at 5.0 branch. Will add a test this morning.

@mheon mheon force-pushed the warn_on_not_tmpfs branch 3 times, most recently from 0cff2b9 to 504208f Compare March 26, 2024 13:56
@mheon
Copy link
Copy Markdown
Member Author

mheon commented Mar 26, 2024

Update: None of our VMs actually use tmpfs /tmp so this breaks absolutely everything.

@mheon mheon force-pushed the warn_on_not_tmpfs branch from 504208f to a0e1466 Compare March 26, 2024 15:49
@mheon
Copy link
Copy Markdown
Member Author

mheon commented Mar 26, 2024

Passed the ignore-warning environment variable through to all Cirrus tasks. Good news is that it makes the test easier; just unset that environment variable and run Podman.

@edsantiago
Copy link
Copy Markdown
Member

containers/automation_images#340 will make /tmp be tmpfs, but that won't help with this v5.0 PR.

@mheon
Copy link
Copy Markdown
Member Author

mheon commented Mar 26, 2024

The PR against main will probably be completely different, and not need the environment variable at all.
Fortunately v5.0 is not going to be an LTS branch so we can stop caring about this little idiosyncrasy in a few months when 5.1 comes out.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might I also suggest

    tmpfs_tmp=$(mktemp -d /dev/shm)
    run_podman --tmpdir=$tmpfs_tmp/tmp --runroot=$tmpfs_tmp/runroot --root=$tmpfs_tmp/root info
    assert "$output" !~ "are not on a tmpfs"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not just mktemp and then mount a tmpfs over that directory?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Laziness? And, will that work rootless? I've never seen a system without /dev/shm (at least not in the last 20 years)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point on rootless

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2024
@mheon mheon force-pushed the warn_on_not_tmpfs branch from a0e1466 to 44ba4de Compare March 27, 2024 15:37
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2024
@packit-as-a-service
Copy link
Copy Markdown

Ephemeral COPR build failed. @containers/packit-build please check.

@mheon mheon force-pushed the warn_on_not_tmpfs branch from 44ba4de to 3467c29 Compare March 27, 2024 17:38
@mheon
Copy link
Copy Markdown
Member Author

mheon commented Mar 27, 2024

@edsantiago Does the test logic seem OK to you? It's acting like the environment variable is still defined.

@edsantiago
Copy link
Copy Markdown
Member

Maybe because /run?

runRoot: /run/containers/storage

I don't see tmpdir, though. Maybe that's in XDG?

I still think it's better to test with explicit --runroot/--tmpdir + /dev/shm / /var/tmp

@mheon mheon force-pushed the warn_on_not_tmpfs branch 2 times, most recently from e0d54f3 to c9d9e12 Compare March 28, 2024 12:50
@Luap99
Copy link
Copy Markdown
Member

Luap99 commented Mar 28, 2024

Why is this targeted against the 5.0 branch? I don't think it is a good idea to backport such a big warning in a patch release.

@mheon
Copy link
Copy Markdown
Member Author

mheon commented Mar 28, 2024

We decided on an alternate implementation for Main - we'll be outright refusing to run if we detect an unhandled reboot in 5.1. This was lower-impact so I thought it would still be useful in 5.0.1. I'm OK with leaving it out until we discuss further when you get back, though.

Podman requires several directories to be on tmpfs so we can
properly detect (and clean up after) reboots: the tmpdir (for
Libpod files) and runroot (for c/storage files). We have not done
any validation that they are actual on tmpfs filesystems in the
past, which leads to no end of people wondering why Podman is
broken on their system (where, perhaps, /tmp is not a tmpfs,
which the FHS allows because sanity cannot exist in this world,
and there is no systemd session so /run/user/$UID does not
exist).

This adds an actual warning on launching Podman with such an
unsupported configuration. There's an environment variable if the
user is certain they want to run Podman like this (perhaps having
used systemd-tmpfiles to automatically clean the relevant dir on
reboot), but otherwise we warn every time a command is run with
runroot/tmpdir on a non-tmpfs filesystem.

Signed-off-by: Matt Heon <mheon@redhat.com>
@mheon mheon force-pushed the warn_on_not_tmpfs branch from c9d9e12 to e45a194 Compare April 1, 2024 12:19
edsantiago added a commit to edsantiago/container_automation_images that referenced this pull request Apr 1, 2024
Podman *really* needs /tmp to be tmpfs, to detect and
handle reboots. Although there are (at this time) no
reboots involved in CI testing, it's still important
for CI hosts to reflect something close to a real-world
environment. And, there is work underway to check /tmp:

  containers/podman#22141

This PR removes special-case Fedora code that was
disabling a tmpfs /tmp mount. History dates back to
PR containers#30 back in 2020.

Some of the image-build code in this repo performs
reboots and relies on persistent tmp files, so you'll
note a flurry of /tmp -> /var/tmp changes.

And, as a drive-by, document the Windows Chocolatey
install command. Link to Best Practices, and explain
why we disregard some of those.

Signed-off-by: Ed Santiago <santiago@redhat.com>
@Luap99
Copy link
Copy Markdown
Member

Luap99 commented Apr 2, 2024

Note I feel rather strongly about not adding such big warnings in a patch release, we lived long enough with the status quo so waiting few more weeks shouldn't be that big of a deal. And if we plan to fix this in a different way on main anyway then I think we should just do that and not introduce two different fixes in two different versions which will likely cause some differences in behavior and thus confusion.

@mheon
Copy link
Copy Markdown
Member Author

mheon commented Apr 2, 2024

I'll go ahead and close, then. We can fix properly in 5.1.

@mheon mheon closed this Apr 2, 2024
edsantiago added a commit to edsantiago/container_automation_images that referenced this pull request Apr 8, 2024
Podman *really* needs /tmp to be tmpfs, to detect and
handle reboots. Although there are (at this time) no
reboots involved in CI testing, it's still important
for CI hosts to reflect something close to a real-world
environment. And, there is work underway to check /tmp:

  containers/podman#22141

This PR removes special-case Fedora code that was
disabling a tmpfs /tmp mount. History dates back to
PR containers#30 back in 2020.

Some of the image-build code in this repo performs
reboots and relies on persistent tmp files, so you'll
note a flurry of /tmp -> /var/tmp changes.

And, as a drive-by, document the Windows Chocolatey
install command. Link to Best Practices, and explain
why we disregard some of those.

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/container_automation_images that referenced this pull request Apr 10, 2024
Podman *really* needs /tmp to be tmpfs, to detect and
handle reboots. Although there are (at this time) no
reboots involved in CI testing, it's still important
for CI hosts to reflect something close to a real-world
environment. And, there is work underway to check /tmp:

  containers/podman#22141

This PR removes special-case Fedora code that was
disabling a tmpfs /tmp mount. History dates back to
PR containers#30 back in 2020.

Some of the image-build code in this repo performs
reboots and relies on persistent tmp files, so you'll
note a flurry of /tmp -> /var/tmp changes.

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/container_automation_images that referenced this pull request Apr 11, 2024
Podman *really* needs /tmp to be tmpfs, to detect and
handle reboots. Although there are (at this time) no
reboots involved in CI testing, it's still important
for CI hosts to reflect something close to a real-world
environment. And, there is work underway to check /tmp:

  containers/podman#22141

This PR removes special-case Fedora code that was
disabling a tmpfs /tmp mount. History dates back to
PR containers#30 back in 2020.

Some of the image-build code in this repo performs
reboots and relies on persistent tmp files, so you'll
note a flurry of /tmp -> /var/tmp changes.

Signed-off-by: Ed Santiago <santiago@redhat.com>
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jul 2, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Jul 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants