Skip to content

Add Containerfile and build.sh to build it.#59868

Merged
dmick merged 1 commit intoceph:mainfrom
dmick:wip-container
Oct 5, 2024
Merged

Add Containerfile and build.sh to build it.#59868
dmick merged 1 commit intoceph:mainfrom
dmick:wip-container

Conversation

@dmick
Copy link
Member

@dmick dmick commented Sep 18, 2024

The intent is to replace ceph-container.git, at first for ci containers only, and eventually production containers as well.

There is code present for production containers, including a separate "make-manifest-list.py" to scan for and glue the two arch-specific containers into a 'manifest-list' 'fat' container, but that code is not yet fully tested.

This code will not be used until a corresponding change to the Jenkins jobs in ceph-build.git is pushed.

Currently marked "draft" until the final CI build after cleanup is clean and tested. Previously tested with a copy of the ceph-dev-new family of Jenkins jobs modified to use this container build mechanism, and run through these teuthology suites to validate basic container functionality as well as iscsi and nfs-ganesha:

http://pulpito.front.sepia.ceph.com/dmick-2024-09-18_01:08:20-orch:cephadm-wip-container-distro-default-smithi/
http://pulpito.front.sepia.ceph.com/dmick-2024-09-18_01:09:36-orch:cephadm-wip-container-distro-default-smithi/

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@dmick
Copy link
Member Author

dmick commented Sep 19, 2024

jenkins retest windows

@dmick
Copy link
Member Author

dmick commented Sep 19, 2024

jenkins test windows

@dmick
Copy link
Member Author

dmick commented Sep 19, 2024

jenkins test make check arm64

@dmick
Copy link
Member Author

dmick commented Sep 19, 2024

I don't know what's going on with the windows tests or make check arm64 tests, but I can guarantee it's not related to this PR.

@dmick dmick marked this pull request as ready for review September 19, 2024 22:04
Copy link
Member

@zmc zmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a full review yet, but: I would like to see:

  • sudo dropped from the podman invocations.
  • probably a podman image pull of the base image in build.sh
  • nit: consistency with things like >> packages.txt vs >>packages.txt

@dmick
Copy link
Member Author

dmick commented Sep 23, 2024

I sympathize with the "no sudo", but the failures were so weird on the build slaves that I couldn't decode them. Of course I don't know now what they were exactly, but I was out of patience that day and just did what everything else does. It was never necessary on my test hosts. If you're adamant I can try to debug again.

I don't get the 'podman image pull'; is the implicit pull suboptimal somehow?

I'll fix spaces on redirects.

@phlogistonjohn
Copy link
Contributor

phlogistonjohn commented Sep 24, 2024

If you find you can't remove the sudo invocations because of the CI environment, could you replace them with a function call something like:

_sudo() {
   if [ "$UID" = "0" -o "$USE_SUDO" = "no" ]; then
        "$@"
   else
        sudo "$@"
   fi
}

so that someone invoking the script manually can choose disable sudo?

@zmc
Copy link
Member

zmc commented Sep 24, 2024

I can help with issues around rootless podman.

I'd like to see sane defaults set for the inputs - the script refuses to run without them all set.
BRANCH should maybe default to the current branch if there is one
CEPH_SHA should default to the commit that's checked out
ARCH should default to the machine's actual arch
CONTAINER_REPO_HOSTNAME and CONTAINER_REPO_ORGANIZATION could default to 'localhost' and null, respectively. If the hostname is 'localhost', we can skip pushing and therefore caring about credentials

I would actually argue that the script should be unaware of authentication, or at most re-export REGISTRY_AUTH_FILE if required. podman doesn't require the password to be submitted cleartext as a flag, and it ending up in logfiles or history files doesn't feel good to me.

@ktdreyer
Copy link
Contributor

Re: pulling before building, you can do it with podman build --pull=newer

@dmick
Copy link
Member Author

dmick commented Sep 24, 2024

@Phlogiston: sure, if non-sudo doesn't work for the builders, we can make it optional for non-builders. I hadn't really spent that much time optimizing the script for non-CI use, as you might expect, so that informs a number of the choices I made/flexibility I didn't add, but it won't hurt to have some more.

@zmc: so can you elaborate on why you want an image pull? Does FROM not check the image for "latest available"? @ktdreyer thanks for the tip if we need it

@zmc: I will set up a test without sudo and see what happened (happens).

@zmc: defaults: I had them initially, but again, focusing on CI, I wanted primarily to make sure the job failed if it didn't have enough information, and make sure I was consciously setting everything to exactly what I wanted. It's less convenient for outside-CI use, true. I'm torn on this; it's easy to add a wrapper to call build.sh with whatever defaults someone wishes, but harder to catch "oh shit I didn't specify a branch and this is going to build and push main". Undecided, more input welcome.

@zmc: auth is an interesting question. Of course when run from Jenkins the passwords will be elided, as usual, and it's sort of our pattern for all these tools/authentication mechs. I'm not opposed to the idea of external-to-build.sh auth, but jenkins needs to do it somehow, so would take another shell snippet somewhere. Maybe we could decompose it into pieces; Ken originally pushed toward "as simple as possible to invoke" for flexibility for other build systems, but maybe having a bit specific to Jenkins would allow the script to assume things like auth are taken care of somewhere else (and they would be different for, say, Tekton, one assumes).

Enough open questions here that maybe we should meet? Maybe invite Ken to our Wednesday noon meeting this week if he can make it?

@ktdreyer
Copy link
Contributor

A small change landed in ceph-container that we should copy here. We bumped nfs-ganesha from 5 to 6 for main and squid, and we added a new gmonitoring sub-package to the list of nfs-ganesha packages. Details in ceph/ceph-container#2235

@dmick
Copy link
Member Author

dmick commented Sep 25, 2024

With some offline discussion, the plan is:

  1. remove sudo (retesting builds now)
  2. presume that repo auth is taken care of outside build.sh; add an early test to build.sh (and perhaps an even earlier one to the jenkins job changes that enable this) to failfast if auth isn't current/valid
  3. set appropriate defaults in build.sh to reduce load for direct invocation (testing and non-Jenkins use)
  4. nfs-ganesha is moving to build 6; straighten that out and update the dependency before merge
  5. whatever I forgot that @zmc and @ktdreyer can remind me of

Other refinements can wait for incremental changes here once we get this in-tree

Copy link
Contributor

@andrewschoen andrewschoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you for doing this work!

I agree with the points already made about not using sudo and keeping auth a responsibility of the caller and not build.sh.

Concerning the multi-arch support, have you looked into the --platform and --manifest flags of podman build? I did some work with them recently for the samba containers. Nothing to concern this PR with, mostly just a question / idea for future updates.

@dmick dmick force-pushed the wip-container branch 3 times, most recently from d15fe55 to 99eef10 Compare September 30, 2024 18:04
@dmick
Copy link
Member Author

dmick commented Oct 2, 2024

@andrewschoen thanks for the look. I think --platform and --arch are of limited utility; I have to RUN a bunch of stuff inside the container and I think that needs to run on the right host to work. The manifest list image is easy enough to do; it's mostly about coordinating between the two separate builds (one for each arch) so that they're both done by the time the manifest-list build runs.

@dmick
Copy link
Member Author

dmick commented Oct 2, 2024

@zmc @ktdreyer @andrewschoen @phlogistonjohn I think this is ready for re-review. I'll squash all the 'review' commits once everyone's happy. I managed to get a build done and test the resulting container with the original two tests (cephadm iscsi and nfs) so I'm confident the container is OK. There are a few changes to the ceph-build PR as well; that'll happen tomorrow.

Copy link
Contributor

@phlogistonjohn phlogistonjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is sufficiently good for a starting point for in-tree image builds. Thanks for making this happen!

The intent is to replace ceph-container.git, at first for ci containers
only, and eventually production containers as well.

There is code present for production containers, including
a separate "make-manifest-list.py" to scan for and glue the two
arch-specific containers into a 'manifest-list' 'fat' container,
but that code is not yet fully tested.

This code will not be used until a corresponding change to the
Jenkins jobs in ceph-build.git is pushed.

Note that this tooling does not authenticate to the container repo;
it is assumed that will be done elsewhere.  Authentication is
verified by pushing a minimal image to the requested repo.

Signed-off-by: Dan Mick <dmick@redhat.com>
@dmick
Copy link
Member Author

dmick commented Oct 3, 2024

jenkins test api

1 similar comment
@dmick
Copy link
Member Author

dmick commented Oct 3, 2024

jenkins test api

@idryomov
Copy link
Contributor

idryomov commented Oct 14, 2024

There is a follow up fix in #60255.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants