Skip to content

Turn on the podman-commands script to verify man pages#6374

Merged
openshift-merge-robot merged 1 commit intocontainers:masterfrom
rhatdan:build1
Jun 4, 2020
Merged

Turn on the podman-commands script to verify man pages#6374
openshift-merge-robot merged 1 commit intocontainers:masterfrom
rhatdan:build1

Conversation

@rhatdan
Copy link
Member

@rhatdan rhatdan commented May 24, 2020

Signed-off-by: Daniel J Walsh dwalsh@redhat.com

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2020
@rhatdan rhatdan changed the title Turn on the podmn-commands script to verify man pages Turn on the podman-commands script to verify man pages May 25, 2020
@TomSweeneyRedHat
Copy link
Member

@rhatdan looks like your script is working. Gating is failing due to missing pieces....

@rhatdan
Copy link
Member Author

rhatdan commented May 27, 2020

@TomSweeneyRedHat Not sure what you mean? Missing pieces?

@TomSweeneyRedHat
Copy link
Member

@rhatdan pieces == your script had found things that were missing in the docs as it was designed to do and it looked to be failing the gating due to that.

@rhatdan
Copy link
Member Author

rhatdan commented May 27, 2020

If I run that container and the script within it, it passes every time. What I am trying to figure out is why it is blowing up in CI?

@rhatdan
Copy link
Member Author

rhatdan commented May 30, 2020

@baude @cevich What command exactly is cirrus running on, so I can run this locally. If I download the image and run the test with podman, it works for me. But there must be some other way that cirrus is running that test that is not passing.

@baude
Copy link
Member

baude commented Jun 1, 2020

it looks like the first podman command in this instance is going belly up. Notice the error:

[+0000s] checking: podman 
[+0000s] Error: mount /var/lib/containers/storage/overlay:/var/lib/containers/storage/overlay, flags: 0x1000: permission denied

I wonder if this makes it all wrong from there. @edsantiago WDYT is going on here?

@edsantiago
Copy link
Member

As best I can tell from the error context, that message is coming from ./bin/podman help. As in (reconstruction):

$ ./bin/podman help
Error: mount /var/lib/etc etc

(without the expected output of podman help).

I have to guess that the problem is in the previous line:

- '/usr/local/bin/entrypoint.sh podman |& ${TIMESTAMP}'

I think this might be succeeding, but leaving behind a corrupt bin/podman

@edsantiago
Copy link
Member

hack/podman-commands.sh was disabled on April 16. Since then there have been seven commits in contrib/gate. Running a very quick diff, I've got a hunch that the culprit might be ab06813 and its followup 154debb:

 RUN dnf -y install \
-   $(grep "^[^#]" $GOSRC/contrib/dependencies.txt) conmon crun \
+   $(grep "^[^#]" $GOSRC/contrib/dependencies.txt) diffutils containers-common fuse-overlayfs conmon crun runc --exclude container-selinux; \
+   sed -i -e 's|^#mount_program|mount_program|g' /etc/containers/storage.conf \
    && dnf clean all

Unfortunately the git commit message doesn't explain why this was added. @rhatdan do you remember what that sed was intended to solve? Could it be causing the EPERM problem we're seeing in this PR?

@rhatdan
Copy link
Member Author

rhatdan commented Jun 1, 2020

That change enabeled fuse-overlayfs inside of the container.

@edsantiago
Copy link
Member

I figured out that much 😉 -- what I don't understand is the why. I assume this solved some problem, but what? And is it possible that this is causing the mount error? And if so, how can we achieve the dual goals of (whatever it was that needs fuse-overlayfs) and (getting the check-podman script to work in the gating container)?

@rhatdan
Copy link
Member Author

rhatdan commented Jun 2, 2020

It was an attempt to fix this issue. The error I was seeing when I tested this locally was mounting overlay on overlay, so I added fuse-overlayfs and the local test passed. The problem with this test altogether, is I can't figure out how to do it locally. If you know the Podman command to execute, then we could examine what is going on.

If I pull down this container image and run with the libpod directory mounted on /usr/src and run a shell within the container, the script succeeds.

We can remove the fuse-overlay line and see if the tests passes.

@edsantiago
Copy link
Member

Okay, here is the shortest recipe I can devise for reproducing the problem:

! Create local libpod source dir
host# mkdir /tmp/gate; cd /tmp/gate
host# git clone https://github.com/containers/libpod.git libpod

! Run a shell in the gate container
host# podman run -it -v /tmp/gate/libpod:/usr/src/libpod:z --entrypoint /bin/bash quay.io/libpod/gate:master

! Build podman inside the container
container# /usr/local/bin/entrypoint.sh podman

! Run the newly-built podman
container# /var/tmp/go/src/github.com/containers/libpod/bin/podman
Error: mount /var/lib/containers/storage/overlay:/var/lib/containers/storage/overlay, flags: 0x1000: operation not permitted

Editing /etc/containers/storage.conf and re-commenting-out the mount_program line yields:

Error: 'overlay' is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver

Adding --storage-driver vfs to the host podman command fails, of course, because I've already run podman once and that is now stuck in the db:

Error: error creating libpod runtime: database storage graph driver "overlay" does not match our storage graph driver "vfs": database configuration mismatch

At this point, since I'm playing on my own laptop, I don't want to destabilize anything so I'm giving up for now. I can spin up a virt to continue testing if so desired.

@rhatdan
Copy link
Member Author

rhatdan commented Jun 2, 2020

podman run --dev /dev/fuse -it -v /tmp/gate/libpod:/usr/src/libpod:z --entrypoint /bin/bash quay.io/libpod/gate:master

@rhatdan
Copy link
Member Author

rhatdan commented Jun 2, 2020

@giuseppe Something is failing when trying to mount fuse-overlayfs, looks like we require CAP_SYSADMIN?

Any ideas?

@rhatdan
Copy link
Member Author

rhatdan commented Jun 2, 2020

If I run podman above with --device /dev/fuse and then follow @edsantiago Procedure, I am now getting
Error: mount /var/lib/containers/storage/overlay:/var/lib/containers/storage/overlay, flags: 0x1000: operation not permitted

If I add --add-cap SYS_ADMIN the command works.

@rhatdan
Copy link
Member Author

rhatdan commented Jun 2, 2020

Looks like the bind mount is failing.
mount("/var/lib/containers/storage/overlay", "/var/lib/containers/storage/overlay", 0xc0003b4248, MS_BIND, NULL) = -1 EPERM (Operation not permitted)

@edsantiago
Copy link
Member

Update: fuse-overlayfs doesn't come into the picture at all. It's a red herring. What seems to be happening is:

  • with commented-out mount_program in storage.conf: podman detects, early on, that overlay-on-overlay will not work, and exits with a mount_program is required error
  • with mount_program = "/bin/echo" (or anything whatsoever as long as it exists on disk): podman makes it past the above check, tries to bind-mount /var/lib/containers/storage/overlay, and fails with the flags: 0x1000 error.

In no case does podman even make it to fork/execing fuse-overlayfs. This is all internal podman.

@giuseppe your expert opinion would be greatly appreciated

@rhatdan
Copy link
Member Author

rhatdan commented Jun 3, 2020

If we merge containers/storage#639 then we can set the skip_mount option and allow this to proceed.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
@rhatdan
Copy link
Member Author

rhatdan commented Jun 4, 2020

@baude @edsantiago @vrothberg @giuseppe @mheon @QiWang19 @TomSweeneyRedHat
This is finally by the gating, PTAL and get it merged.

@TomSweeneyRedHat
Copy link
Member

LGTM
assuming happy tests and I'd like a head nod from @edsantiago

@edsantiago
Copy link
Member

@rhatdan I think we've all had thousand-line PRs that were easier to review than this one!

Gating results confirmed. Thank you for persevering in tracking this down.

/lgtm
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 4, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2020
@mheon
Copy link
Member

mheon commented Jun 4, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 4, 2020
@openshift-merge-robot openshift-merge-robot merged commit 6cc323c into containers:master Jun 4, 2020
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants