System tests: honor $OCI_RUNTIME (for CI)#10199
System tests: honor $OCI_RUNTIME (for CI)#10199openshift-merge-robot merged 1 commit intocontainers:masterfrom
Conversation
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: edsantiago, rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fb57ac4 to
f2aa600
Compare
|
[re-pushed with a refinement: the |
test/system/410-selinux.bats
Outdated
There was a problem hiding this comment.
This is a new skip; I realize that the clutter (removing very-old cruft) makes it hard to review. @rhatdan, do you have any idea why this would fail under runc? More importantly: is this an expected failure?
There was a problem hiding this comment.
I have no memory of this. I guess the best we can do is try it out and see if it fails.
There was a problem hiding this comment.
It does fail -- see the newly-added comment (lines 54-55, green) (and again, I apologize for the difficulty of review. The removed cruft (red) dates to October 2020, when some CI systems had an old version of container-selinux.)
There was a problem hiding this comment.
podman --runtime=runc run --pid=host alpine cat /proc/self/attr/current
Error: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: readonly path /proc/asound: operation not permitted: OCI permission denied
Weird
@giuseppe any idea what is going on here?
There was a problem hiding this comment.
One more data point: on RHEL, IIRC, the failure is in /proc/bus instead of /proc/asound. This makes me suspect that it's just hitting the alphabetically-first entry under /proc. In case that helps.
There was a problem hiding this comment.
It seems like runc is attempting to write to /proc when --pid=host and SELinux is enabled, and failing.
There was a problem hiding this comment.
If I build runc from main branch, I get.
podman --runtime=$PWD/runc run --pid=host alpine cat /proc/self/attr/current
unconfined_u:system_r:spc_t:s0
There was a problem hiding this comment.
rpm -qf /bin/runc
containerd.io-1.4.3-3.2.fc33.x86_64
There was a problem hiding this comment.
On my f33 system: runc-1.0.0-377.rc93.fc33.x86_64
|
This PR has already been much more complicated than it should've been; and it's getting worse. Ubunto 2010 (which is now old-ubuntu, apparently) is now failing, and it's not a flake: I'll look into it tomorrow. I will also have to deal with OCI_RUNTIME=/path (I had assumed it was just the name). [UPDATE: solved, by making |
Some CI systems set $OCI_RUNTIME as a way to override the default crun. Integration (e2e) tests honor this, but system tests were not aware of the convention; this means we haven't been testing system tests with runc, which means RHEL gating tests are now failing. The proper solution would be to edit containers.conf on CI systems. Sorry, that would involve too much CI-VM work. Instead, this PR detects $OCI_RUNTIME and creates a dummy containers.conf file using that runtime. Add: various skips for tests that don't work with runc. Refactor: add a helper function so we don't need to do the complicated 'podman info blah blah .OCIRuntime.blah' thing in many places. BUG: we leave a tmp file behind on exit. Signed-off-by: Ed Santiago <santiago@redhat.com>
f2aa600 to
9fd7ab5
Compare
|
Tests green, but I need to go over each and every system-test result to confirm that nothing was skipped inadvertently; I'll do that later this morning. |
|
Here we are. All looks good. For some reason I had thought f33 (prior fedora) would use runc; but no, it's only prior_ubuntu that uses runc. So we don't actually get any rootless runc testing. But anyway, PR is working as desired.
There's only one useful result, but the others, at least I didn't break anything. I also confirmed the skips, that is, that I'm not inadvertently skipping code that should run. |
|
/lgtm |
This exposed a nasty bug in our system-test setup: Ubuntu (runc) was writing a scratch containers.conf file, and setting CONTAINERS_CONF to point to it. This was well-intentionedly introduced in containers#10199 as part of our long sad history of not testing runc. What I did not understand at that time is that CONTAINERS_CONF is **dangerous**: it does not mean "I will read standard containers.conf and then override", it means "I will **IGNORE** standard containers.conf and use only the settings in this file"! So on Ubuntu we were losing all the default settings: capabilities, sysctls, all. Yes, this is documented in containers.conf(5) but it is such a huge violation of POLA that I need to repeat it. In containers#14972, as yet another attempt to fix our runc crisis, I introduced a new runc-override mechanism: create a custom /etc/containers/containers.conf when OCI_RUNTIME=runc. Unlike the CONTAINERS_CONF envariable, the /etc file actually means what you think it means: "read the default file first, then override with the /etc file contents". I.e., we get the desired defaults. But I didn't remember this helpers.bash workaround, so our runc testing has actually been flawed: we have not been testing with the system containers.conf. This commit removes the no-longer-needed and never-actually-wanted workaround, and by virtue of testing the cap-drops in kube generate, we add a regression test to make sure this never happens again. It's a little scary that we haven't been testing capabilities. Also scary: this PR requires python, for converting yaml to json. I think that should be safe: python3 'import yaml' and 'json' works fine on a RHEL8.7 VM from 1minutetip. Signed-off-by: Ed Santiago <santiago@redhat.com>
Some CI systems set $OCI_RUNTIME as a way to override the
default crun. Integration (e2e) tests honor this, but system
tests were not aware of the convention; this means we haven't
been testing system tests with runc, which means RHEL gating
tests are now failing.
The proper solution would be to edit containers.conf on CI
systems. Sorry, that would involve too much CI-VM work.
Instead, this PR detects $OCI_RUNTIME and creates a dummy
containers.conf file using that runtime.
Add: various skips for tests that don't work with runc.
Refactor: add a helper function so we don't need to do
the complicated 'podman info blah blah .OCIRuntime.blah'
thing in many places.
BUG: we leave a tmp file behind on exit.
Signed-off-by: Ed Santiago santiago@redhat.com