Skip to content

DO NOT MERGE: buildah vendor treadmill#13808

Draft
edsantiago wants to merge 2 commits intocontainers:mainfrom
edsantiago:vendor_buildah_latest
Draft

DO NOT MERGE: buildah vendor treadmill#13808
edsantiago wants to merge 2 commits intocontainers:mainfrom
edsantiago:vendor_buildah_latest

Conversation

@edsantiago
Copy link
Copy Markdown
Member

Followup to Cabal discussion 2022-04-07, re: letting us vendor in buildah on a moment's notice.

Instead of cron'ing this, let's see if this works: keep this PR open perpetually, with daily updates to get latest podman and buildah.

See individual commit messages for details.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 7, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 7, 2022
@edsantiago edsantiago added do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. labels Apr 7, 2022
@edsantiago
Copy link
Copy Markdown
Member Author

The first commit is the important one. This one must be hand-crafted by a human, there is no possible way to automate it. This is the one that, in case of emergency vendor, can be grabbed into a real PR.

The second commit is a throwaway, ignore it. That's just automated fetch-the-latest-buildah.

@edsantiago edsantiago force-pushed the vendor_buildah_latest branch from cda56d1 to a833f6d Compare April 7, 2022 21:53
@openshift-ci openshift-ci bot removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Apr 7, 2022
@edsantiago
Copy link
Copy Markdown
Member Author

bud test failed, remote only, because the new --identity-label=false option is not getting passed through. Probably an easy fix, but not this late in my day.

This is still a proof-of-concept PR, and my feeling so far is positive: it has quickly identified multiple problems in the buildah merge that are easy to solve iteratively. A nightly cron job, or a non-gating step in buildah CI, would be utterly useless because neither one would have the crucial first-commit. Without that, there's no way to make it past the first or second or third failure.

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Apr 8, 2022

I just pushed a fix for this.

@edsantiago edsantiago force-pushed the vendor_buildah_latest branch 2 times, most recently from 4c3987c to a58b300 Compare April 8, 2022 17:43
@Luap99
Copy link
Copy Markdown
Member

Luap99 commented Apr 11, 2022

How is this PR regularly updated?

@edsantiago
Copy link
Copy Markdown
Member Author

By a human, presumably me. There is actually no other way: this is not possible to solve via cron or PRs.

I am working on a script to automate as much of it as possible.

@edsantiago edsantiago force-pushed the vendor_buildah_latest branch from a58b300 to 2ee7d5b Compare April 11, 2022 21:46
@edsantiago edsantiago changed the title [WIP] constant blend of latest-podman + latest-buildah DO NOT MERGE: buildah vendor treadmill Apr 12, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2022
@edsantiago edsantiago force-pushed the vendor_buildah_latest branch from 2ee7d5b to a44108d Compare April 12, 2022 20:06
@edsantiago edsantiago marked this pull request as draft April 13, 2022 11:51
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2022
@edsantiago edsantiago force-pushed the vendor_buildah_latest branch from a44108d to c9c4a1f Compare April 13, 2022 23:34
@edsantiago
Copy link
Copy Markdown
Member Author

Followup, because I don't want anyone to think this is abandonware: it's not. I'm running my script at least twice a day, one of those times in the evening. One of the nice features I added is:

+---> Podman has bumped, but Buildah is unchanged. There's probably not much point to testing this.

So I'm choosing to save CI cycles by not re-pushing today. Should anyone need to re-vendor buildah tomorrow or this weekend (when I'm unavailable), just cherry-pick the first commit from this PR. It's still valid.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2022
@edsantiago edsantiago force-pushed the vendor_buildah_latest branch from c9c4a1f to 2954d4f Compare April 18, 2022 21:25
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 18, 2022
@edsantiago
Copy link
Copy Markdown
Member Author

Yippee, script has caught a bug! containers/buildah#3919 breaks when vendoring into podman:

+---> Running 'make' to confirm that podman builds cleanly...
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build \
        -mod=vendor  \
        -ldflags '-X github.com/containers/podman/v4/libpod/define.gitCommit=3bf05d3113df29addd392032797f4fdf5dae56b7 -X github.com/containers/podman/v4/libpod/define.buildInfo=1650539506 -X github.com/containers/podman/v4/libpod/config._installPrefix=/usr/local -X github.com/containers/podman/v4/libpod/config._etcDir=/usr/local/etc -X github.com/containers/common/pkg/config.additionalHelperBinariesDir= ' \
        -tags "   selinux systemd  exclude_graphdriver_devicemapper seccomp" \
        -o bin/podman ./cmd/podman
# github.com/containers/podman/v4/libpod
libpod/events.go:18:3: cannot use r.config.Engine.EventsLogFileMaxSize (type "github.com/containers/common/pkg/config".eventsLogMaxSize) as type uint64 in field value
make: *** [Makefile:321: bin/podman] Error 2
buildah-vendor-treadmill-x: 'make' failed with new buildah. Cannot continue.

I could just force-push right now and let y'all see the breakage in CI, or I could just report it like I am right here. Or I could file an issue in containers/common since that's where the offending code seems to be?

Anyhow, @rhatdan et al, PTAL because this is not going to vendor into podman.

@edsantiago edsantiago force-pushed the vendor_buildah_latest branch from 2954d4f to 3bf05d3 Compare April 21, 2022 11:18
@edsantiago
Copy link
Copy Markdown
Member Author

What the heck, I pushed anyway in case someone wants to pull & play.

@edsantiago
Copy link
Copy Markdown
Member Author

Thank you. Looks like a reasonable analysis. How do we fix that?

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented May 22, 2024

@containers/buildah-maintainers PTAL. Something in latest buildah is breaking podman int tests. As best I can tell, the common factor is that buildah is now re-pulling images that are already present:

It looks like the image caching save/load sequence is changing the compression used for the image's layer, and thus the manifest, and thus the manifest's digest, and when build checks if the image in the registry has a different digest when compared to the one it has on disk, it sees that it is and pulls it again.

To verify, compare the RepoDigests values you get from podman inspect quay.io/libpod/alpine after freshly pulling the image, and after a podman save/podman rmi/podman load cycle.

But nothing here changed that looking at the diff, it used too work fine and I cannot see any config/compression chnages here

What I do see however is containers/buildah#5478

if platform.OS == "" {
	platform.OS = runtime.GOOS
}

AFAIK the libimage code treats a non empty platform as must pull or something like that if I remember correctly
However I haven't verified that.

@nalind
Copy link
Copy Markdown
Member

nalind commented May 22, 2024

Hmm, this has been happening for a while, since I see it happening with both podman 4.9.4 and 5.0.0 as well, so it might not be the only cause.
But to your question, buildah uses a "dir:" location for the cache to avoid changing the image while caching it, and a custom "copy" test helper since it doesn't want to depend on having skopeo available at test time. But podman already expects skopeo to be present, so skopeo copy should be up to the task.

@nalind
Copy link
Copy Markdown
Member

nalind commented May 22, 2024

AFAIK the libimage code treats a non empty platform as must pull or something like that if I remember correctly However I haven't verified that.

I had suspected as much, but at most, that should be changing the pull policy to "if-newer" under the covers, and I'm pretty sure that's already the value that's being passed in.

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented May 22, 2024

AFAIK the libimage code treats a non empty platform as must pull or something like that if I remember correctly However I haven't verified that.

I had suspected as much, but at most, that should be changing the pull policy to "if-newer" under the covers, and I'm pretty sure that's already the value that's being passed in.

Yeah not sure, but the fact is main is passing but this PR is not so the problem must be in the diff here and thankfully the diff is small so this is my best guess as of why. I can try to reproduce tomorrow.

@edsantiago
Copy link
Copy Markdown
Member Author

What I do see however is containers/buildah#5478

if platform.OS == "" {
platform.OS = runtime.GOOS
}

Nice catch, @Luap99 . Removing those three lines (and the import runtime) appears to fix the problem.

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented May 22, 2024

That was added to make podman build --arch arm64 work

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented May 23, 2024

That was added to make podman build --arch arm64 work

Sure but it fixed it in broken way, the default pull policy is ifmissing and not ifnewer which the code uses now due the well other not so great behavior in libimage.
And reading your PR your intention is to only set TARGETARCH not change the pull policy so this is definitely broken and must be fixed before the buildah release

@edsantiago
Copy link
Copy Markdown
Member Author

Tests now passing. This is safe to vendor into podman; no changes needed. Thank you @nalind for the quick fix.

@edsantiago
Copy link
Copy Markdown
Member Author

Treadmill has been broken for a few weeks, vendoring one of the c-* modules:

CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build \
         \
        -ldflags '-X github.com/containers/podman/v5/libpod/define.gitCommit=71ed11ece1223fd847c759b9ce5af180af370b62 -X github.com/containers/podman/v5/libpod/define.buildInfo=1720019142 -X github.com/containers/podman/v5/libpod/config._installPrefix=/usr/local -X github.com/containers/podman/v5/libpod/config._etcDir=/etc -X github.com/containers/podman/v5/pkg/systemd/quadlet._binDir=/home/esm/bin/x86_64 -X github.com/containers/common/pkg/config.additionalHelperBinariesDir= ' \
        -tags "   systemd libsubid exclude_graphdriver_devicemapper seccomp " \
        -o bin/podman ./cmd/podman
# github.com/containers/podman/v5/libpod
libpod/reset.go:262:26: undefined: blobinfocache.CleanupDefaultCache
make: *** [Makefile:380: bin/podman] Error 1

cc @mheon

@nalind
Copy link
Copy Markdown
Member

nalind commented Jul 3, 2024

Buildah's main branch is currently using containers/image v5.31.1, and podman's main branch is currently using an untagged commit on the containers/image main branch which is newer than that. Attempting to vendor in buildah is apparently pulling the version of image that buildah uses, which doesn't supply the new function that libpod already depends on.

@edsantiago
Copy link
Copy Markdown
Member Author

containers/buildah#5585 has just merged. Treadmill is now failing with:

$ make
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build \
         \
        -ldflags '-X github.com/containers/podman/v5/libpod/define.gitCommit=e7bf0e98bbaebe5dd703c2080cf03fecf244720c -X github.com/containers/podman/v5/libpod/define.buildInfo=1721044095 -X github.com/containers/podman/v5/libpod/config._installPrefix=/usr/local -X github.com/containers/podman/v5/libpod/config._etcDir=/etc -X github.com/containers/podman/v5/pkg/systemd/quadlet._binDir=/home/esm/bin/x86_64 -X github.com/containers/common/pkg/config.additionalHelperBinariesDir= ' \
        -tags "   systemd libsubid exclude_graphdriver_devicemapper seccomp " \
        -o bin/podman ./cmd/podman
test -z "-Z" || chcon -t container_runtime_exec_t bin/podman
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build \
         \
        -ldflags '-X github.com/containers/podman/v5/libpod/define.gitCommit=e7bf0e98bbaebe5dd703c2080cf03fecf244720c -X github.com/containers/podman/v5/libpod/define.buildInfo=1721044108 -X github.com/containers/podman/v5/libpod/config._installPrefix=/usr/local -X github.com/containers/podman/v5/libpod/config._etcDir=/etc -X github.com/containers/podman/v5/pkg/systemd/quadlet._binDir=/home/esm/bin/x86_64 -X github.com/containers/common/pkg/config.additionalHelperBinariesDir= ' \
        -tags "remote exclude_graphdriver_btrfs btrfs_noversion exclude_graphdriver_devicemapper containers_image_openpgp" \
        -o bin/podman-remote ./cmd/podman
package github.com/containers/podman/v5/cmd/podman
        imports github.com/containers/podman/v5/cmd/podman/common
        imports github.com/containers/buildah/define
        imports github.com/containers/common/libimage: build constraints exclude all Go files in vendor/github.com/containers/common/libimage
make: *** [Makefile:393: bin/podman-remote] Error 1
buildah-vendor-treadmill-x: 'make' failed with new buildah. Cannot continue.

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Jul 15, 2024

Working on it.

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Jul 15, 2024

@nalind @edsantiago Looks like this is the culprit
buildah commit

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Jul 15, 2024

@aaronlehmann PTAL

@nalind
Copy link
Copy Markdown
Member

nalind commented Jul 15, 2024

One option in containers/buildah#5628.

@nalind
Copy link
Copy Markdown
Member

nalind commented Jul 15, 2024

One option in containers/buildah#5628.

Another option, which all things being equal, I prefer, in containers/common#2089.

@edsantiago
Copy link
Copy Markdown
Member Author

Aaaaand we're back to undefined: blobinfocache.CleanupDefaultCache. Dunno if it's the latest podman commit or the latest buildah one. Somehow c-image looks correct on buildah, but treadmill now wants to change it to v5.31.1

@nalind
Copy link
Copy Markdown
Member

nalind commented Jul 15, 2024

containers/buildah#5627 was a downgrade.

@packit-as-a-service
Copy link
Copy Markdown

We were not able to find or create Copr project packit/containers-podman-13808 specified in the config with the following error:

Packit received HTTP 500 Internal Server Error from Copr Service. Check the Copr status page: https://copr.fedorainfracloud.org/status/stats/, or ask for help in Fedora Build System matrix channel: https://matrix.to/#/#buildsys:fedoraproject.org.

Unless the HTTP status code above is >= 500, please check your configuration for:

  1. typos in owner and project name (groups need to be prefixed with @)
  2. whether the project name doesn't contain not allowed characters (only letters, digits, underscores, dashes and dots must be used)
  3. whether the project itself exists (Packit creates projects only in its own namespace)
  4. whether Packit is allowed to build in your Copr project
  5. whether your Copr project/group is not private

@packit-as-a-service
Copy link
Copy Markdown

Ephemeral COPR build failed. @containers/packit-build please check.

@edsantiago
Copy link
Copy Markdown
Member Author

Gah. I completely forgot about remote.

One possible solution will be to use a custom socket for each process, maybe derived from $BATS_SUITE_TEST_NUMBER or maybe pseudorandom. That is going to be trickier than anything I can do in one day.

Good news is, though, THIS EFFORT IS WORTH IT! (parallelizing). Wow, 15 minutes for bud tests.

@edsantiago
Copy link
Copy Markdown
Member Author

*sniffle*. Look at this. Passing bud tests, all in less than sixteen minutes. (For comparison: bud tests are by far our biggest slowdown, taking ~35 minutes on a CI run).

My changes here are absolutely not ready for prime time, but I think they're in good enough shape that someone else should be able to carry them through without much fuss.

This is a JUNK COMMIT from buildah-vendor-treadmill v0.3.

DO NOT MERGE! This is just a way to keep the buildah-podman
vendoring in sync. Refer to:

   https://github.com/containers/podman/wiki/Buildah-Vendor-Treadmill

Signed-off-by: Ed Santiago <santiago@redhat.com>
@Luap99
Copy link
Copy Markdown
Member

Luap99 commented Nov 22, 2024

I was counting on the parallel work to get this time down.

My changes here are absolutely not ready for prime time, but I think they're in good enough shape that someone else should be able to carry them through without much fuss.

Just to be clear we need to get the parallel test going in buildah first? Once that works we can enable it here for bud tests?!

@edsantiago
Copy link
Copy Markdown
Member Author

Just to be clear we need to get the parallel test going in buildah first? Once that works we can enable it here for bud tests?!

I believe so, yes. For context, that work is in containers/buildah#5552 but there's a weird flake that is blocking it.

I am very curious to find out if we hit that flake here in podman testing. I'll keep re-pushing this PR today.

On second thought, it might be worth thinking about pushing this into podman even before the buildah-parallel flake is fixed. Maybe by skipping the bud: build push with --force-compression test (in apply-podman-deltas): the value of test speedup is, IMNCIO (In My Now Completely Irrelevant Opinion) possibly greater than the value of a completely pointless test under podman. "Pointless" because, as the reader may or may not recall, what bud-in-podman tests do is:

    function run_buildah() {
        if arg is "build" or "bud" or "build-using-dockerfile"; then
            use podman build instead
        else
            invoke buildah directly
        fi
    }

Maybe I should just submit a PR and force-merge it. That way all the nice kind wonderful touching things y'all have said about me will poof, vanish into thin air and be replaced by yells of anger....

As you run --sync, please update this commit message with your
actual changes.

Changes since 2024-11-14:

  * EXPERIMENTAL: try running in parallel on fastvm
  * line-offset-only changes to helpers.bash diffs, due to
    new _prefetch()

  * SUPER-EXPERIMENTAL AND POSSIBLY BROKEN: add parallelization
    mechanism to podman-remote tests.

Signed-off-by: Ed Santiago <santiago@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bloat_approved Approve a PR in which binary file size grows by over 50k do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants