System tests: honor $OCI_RUNTIME (for CI) by edsantiago · Pull Request #10199 · containers/podman

edsantiago · 2021-05-03T19:56:07Z

Some CI systems set $OCI_RUNTIME as a way to override the
default crun. Integration (e2e) tests honor this, but system
tests were not aware of the convention; this means we haven't
been testing system tests with runc, which means RHEL gating
tests are now failing.

The proper solution would be to edit containers.conf on CI
systems. Sorry, that would involve too much CI-VM work.
Instead, this PR detects $OCI_RUNTIME and creates a dummy
containers.conf file using that runtime.

Add: various skips for tests that don't work with runc.

Refactor: add a helper function so we don't need to do
the complicated 'podman info blah blah .OCIRuntime.blah'
thing in many places.

BUG: we leave a tmp file behind on exit.

Signed-off-by: Ed Santiago santiago@redhat.com

rhatdan · 2021-05-03T20:18:26Z

/approve
LGTM

openshift-ci-robot · 2021-05-03T20:18:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [edsantiago,rhatdan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

edsantiago · 2021-05-03T20:29:59Z

[re-pushed with a refinement: the pid=host failure only happens when rootless, not rootful. I updated the skip correspondingly, so it still runs on root tests.]

edsantiago · 2021-05-03T20:33:16Z

test/system/410-selinux.bats

This is a new skip; I realize that the clutter (removing very-old cruft) makes it hard to review. @rhatdan, do you have any idea why this would fail under runc? More importantly: is this an expected failure?

I have no memory of this. I guess the best we can do is try it out and see if it fails.

It does fail -- see the newly-added comment (lines 54-55, green) (and again, I apologize for the difficulty of review. The removed cruft (red) dates to October 2020, when some CI systems had an old version of container-selinux.)

podman --runtime=runc run --pid=host alpine cat /proc/self/attr/current
Error: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: readonly path /proc/asound: operation not permitted: OCI permission denied

Weird
@giuseppe any idea what is going on here?

One more data point: on RHEL, IIRC, the failure is in /proc/bus instead of /proc/asound. This makes me suspect that it's just hitting the alphabetically-first entry under /proc. In case that helps.

It seems like runc is attempting to write to /proc when --pid=host and SELinux is enabled, and failing.

If I build runc from main branch, I get.

podman --runtime=$PWD/runc run --pid=host alpine cat /proc/self/attr/current unconfined_u:system_r:spc_t:s0

rpm -qf /bin/runc
containerd.io-1.4.3-3.2.fc33.x86_64

On my f33 system: runc-1.0.0-377.rc93.fc33.x86_64

edsantiago · 2021-05-04T00:02:13Z

This PR has already been much more complicated than it should've been; and it's getting worse. Ubunto 2010 (which is now old-ubuntu, apparently) is now failing, and it's not a flake:

[+0589s] not ok 135 podman volume: exec/noexec
         # (from function `die' in file test/system/helpers.bash, line 412,
         #  from function `run_podman' in file test/system/helpers.bash, line 220,
         #  in test file test/system/160-volumes.bats, line 131)
         #   `run_podman ${expect_rc} run --rm --volume $myvolume:/vol:noexec,z $IMAGE /vol/myscript' failed
         # # podman rm --all --force
         # # podman ps --all --external --format {{.ID}} {{.Names}}
         # # podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
         # quay.io/libpod/testimage:20210223 4abe85964f5c
         # # podman volume rm -a
         # # podman volume create myvol15CCE5Uwiv
         # myvol15CCE5Uwiv
         # # podman volume inspect --format {{.Mountpoint}} myvol15CCE5Uwiv
         # /var/lib/containers/storage/volumes/myvol15CCE5Uwiv/_data
         # # podman run --rm --volume myvol15CCE5Uwiv:/vol:noexec,z quay.io/libpod/testimage:20210223 /vol/myscript
         # standard_init_linux.go:219: exec user process caused: permission denied
         # [ rc=1 (** EXPECTED 126 **) ]
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #| FAIL: exit code is 1; expected 126
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'll look into it tomorrow. I will also have to deal with OCI_RUNTIME=/path (I had assumed it was just the name). [UPDATE: solved, by making podman_runtime() return just basename of the runtime)

Some CI systems set $OCI_RUNTIME as a way to override the default crun. Integration (e2e) tests honor this, but system tests were not aware of the convention; this means we haven't been testing system tests with runc, which means RHEL gating tests are now failing. The proper solution would be to edit containers.conf on CI systems. Sorry, that would involve too much CI-VM work. Instead, this PR detects $OCI_RUNTIME and creates a dummy containers.conf file using that runtime. Add: various skips for tests that don't work with runc. Refactor: add a helper function so we don't need to do the complicated 'podman info blah blah .OCIRuntime.blah' thing in many places. BUG: we leave a tmp file behind on exit. Signed-off-by: Ed Santiago <santiago@redhat.com>

edsantiago · 2021-05-04T10:48:11Z

Tests green, but I need to go over each and every system-test result to confirm that nothing was skipped inadvertently; I'll do that later this morning.

edsantiago · 2021-05-04T15:38:26Z

Here we are. All looks good. For some reason I had thought f33 (prior fedora) would use runc; but no, it's only prior_ubuntu that uses runc. So we don't actually get any rootless runc testing. But anyway, PR is working as desired.

	runtime	root	rootless	remote(root)
fedora-33	crun	ok	ok	ok
fedora-34	crun	ok	ok	ok
ubuntu-2010	runc	ok	N/A	ok
ubuntu-2104	crun	ok	N/A	ok

There's only one useful result, but the others, at least I didn't break anything. I also confirmed the skips, that is, that I'm not inadvertently skipping code that should run.

mheon · 2021-05-04T18:45:33Z

/lgtm

This exposed a nasty bug in our system-test setup: Ubuntu (runc) was writing a scratch containers.conf file, and setting CONTAINERS_CONF to point to it. This was well-intentionedly introduced in containers#10199 as part of our long sad history of not testing runc. What I did not understand at that time is that CONTAINERS_CONF is **dangerous**: it does not mean "I will read standard containers.conf and then override", it means "I will **IGNORE** standard containers.conf and use only the settings in this file"! So on Ubuntu we were losing all the default settings: capabilities, sysctls, all. Yes, this is documented in containers.conf(5) but it is such a huge violation of POLA that I need to repeat it. In containers#14972, as yet another attempt to fix our runc crisis, I introduced a new runc-override mechanism: create a custom /etc/containers/containers.conf when OCI_RUNTIME=runc. Unlike the CONTAINERS_CONF envariable, the /etc file actually means what you think it means: "read the default file first, then override with the /etc file contents". I.e., we get the desired defaults. But I didn't remember this helpers.bash workaround, so our runc testing has actually been flawed: we have not been testing with the system containers.conf. This commit removes the no-longer-needed and never-actually-wanted workaround, and by virtue of testing the cap-drops in kube generate, we add a regression test to make sure this never happens again. It's a little scary that we haven't been testing capabilities. Also scary: this PR requires python, for converting yaml to json. I think that should be safe: python3 'import yaml' and 'json' works fine on a RHEL8.7 VM from 1minutetip. Signed-off-by: Ed Santiago <santiago@redhat.com>

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 3, 2021

edsantiago force-pushed the system_tests_with_runc_override branch from fb57ac4 to f2aa600 Compare May 3, 2021 20:29

edsantiago commented May 3, 2021

View reviewed changes

edsantiago force-pushed the system_tests_with_runc_override branch from f2aa600 to 9fd7ab5 Compare May 4, 2021 02:15

openshift-ci-robot assigned mheon May 4, 2021

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 4, 2021

openshift-merge-robot merged commit 8eefca5 into containers:master May 4, 2021

edsantiago deleted the system_tests_with_runc_override branch May 4, 2021 18:50

edsantiago mentioned this pull request Aug 18, 2022

podman kube generate - add actual tests #15365

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023

Conversation

edsantiago commented May 3, 2021

Uh oh!

rhatdan commented May 3, 2021

Uh oh!

openshift-ci-robot commented May 3, 2021

Uh oh!

edsantiago commented May 3, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edsantiago commented May 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edsantiago commented May 4, 2021

Uh oh!

edsantiago commented May 4, 2021

Uh oh!

mheon commented May 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

edsantiago commented May 4, 2021 •

edited

Loading