add initial container file and github build action#54575
add initial container file and github build action#54575
Conversation
919c509 to
13f476f
Compare
|
Note: the
|
| # Copr repos | ||
| # TODO: afaict, scikit is for mgr-diskpredition-local, asyncssh for cephadm, but why do we | ||
| # encode these as deps only in container instead of including them as a vendored | ||
| # submodule in the ceph repo or something to be included with ceph packages? | ||
| # @tchaikov @torchiaf @adk3798 @ktdreyer? | ||
| # ref: https://github.com/ceph/ceph-container/pull/1821 | ||
| # note: scikit-learn packages are not included in ubi8 in ceph-container | ||
| RUN \ | ||
| dnf install -y --setopt=install_weak_deps=False dnf-plugins-core && \ | ||
| dnf copr enable -y tchaikov/python-scikit-learn && \ | ||
| dnf copr enable -y tchaikov/python3-asyncssh |
There was a problem hiding this comment.
For asyncssh at least, the cephadm mgr module needs it. I think we couldn't find any packaging of it in official repos so Kefu had this copr repo set up. We should maybe host this under the copr of somebody more involved in the project at this point, but we still need it somewhere.
Edit - didn't read the full comment first.
I assume keeping it in a copr repo is just easier than having it as a submodule, but I didn't really look into it.
There was a problem hiding this comment.
Thanks for weighing in. It's not a big issue, but it seemed worth asking since I'm not sure how hard it is to set up and maintain copr repos. Regardless, it'll be something to changed later down the line if changing it makes sense.
container/Containerfile
Outdated
| # General | ||
| PACKAGES="ca-certificates" && \ | ||
| # Ceph | ||
| # TODO: I removed these packages that were included in ganesha packages: | ||
| # gcc, lua-devel, luarocks | ||
| # They are listed as BuildRequires in ceph.spec.in, so I assume they are unnecessary | ||
| # @mattbenjamin please advise about whether these are necessary for runtime | ||
| PACKAGES="$PACKAGES \ | ||
| ceph-common${CEPH_PACKAGE_VERSION} \ | ||
| ceph-exporter${CEPH_PACKAGE_VERSION} \ | ||
| ceph-grafana-dashboards${CEPH_PACKAGE_VERSION} \ | ||
| ceph-immutable-object-cache${CEPH_PACKAGE_VERSION} \ | ||
| ceph-mds${CEPH_PACKAGE_VERSION} \ | ||
| ceph-mgr-cephadm${CEPH_PACKAGE_VERSION} \ | ||
| ceph-mgr-dashboard${CEPH_PACKAGE_VERSION} \ | ||
| ceph-mgr-diskprediction-local${CEPH_PACKAGE_VERSION} \ | ||
| ceph-mgr-k8sevents${CEPH_PACKAGE_VERSION} \ | ||
| ceph-mgr-rook${CEPH_PACKAGE_VERSION} \ | ||
| ceph-mgr${CEPH_PACKAGE_VERSION} \ | ||
| ceph-mon${CEPH_PACKAGE_VERSION} \ | ||
| ceph-osd${CEPH_PACKAGE_VERSION} \ | ||
| ceph-radosgw${CEPH_PACKAGE_VERSION} \ | ||
| ceph-volume${CEPH_PACKAGE_VERSION} \ | ||
| cephfs-mirror${CEPH_PACKAGE_VERSION} \ | ||
| cephfs-top${CEPH_PACKAGE_VERSION} \ | ||
| kmod \ | ||
| libradosstriper1${CEPH_PACKAGE_VERSION} \ | ||
| rbd-mirror${CEPH_PACKAGE_VERSION} \ | ||
| $(cat crimson-packages.txt) \ |
There was a problem hiding this comment.
@mattbenjamin could you advise if gcc, lua-devel, or luarocks packages are needed in the ceph container image?
There was a problem hiding this comment.
Added those back in here. Left gcc removed for now.
| # Ceph-CSI | ||
| # TODO: coordinate with @Madhu-1 to have Ceph-CSI install these itself if unused by ceph | ||
| # @adk3798 does cephadm use these? | ||
| PACKAGES="$PACKAGES \ | ||
| attr \ | ||
| ceph-fuse${CEPH_PACKAGE_VERSION} \ | ||
| rbd-nbd${CEPH_PACKAGE_VERSION} \ | ||
| " && \ |
There was a problem hiding this comment.
@Madhu-1 @adk3798 let me know if you have thoughts around this. It seems to me like keeping the ceph-fuse and rbd-nbd drivers in the ceph image makes sense, even for non-CSI images. But if you have differing thoughts, let's discuss.
Also, @Madhu-1 is attr still needed, and if so, do you think it's broadly needed or only for Ceph-CSI?
There was a problem hiding this comment.
Cephadm doesn't use these directly at least. Can't say with certainty for any of the daemons cephadm deploys that use the ceph container image though.
There was a problem hiding this comment.
Leaving these for now.
container/Containerfile
Outdated
| # TODO: Per discussion, sg3_utils should be moved to ceph.spec.in with libstoragemgmt; @ktdreyer | ||
| # ref: https://github.com/ceph/ceph-container/pull/2013#issuecomment-1248606472 | ||
| PACKAGES="$PACKAGES \ | ||
| gdisk \ | ||
| hostname \ | ||
| procps-ng \ | ||
| sg3_utils \ |
There was a problem hiding this comment.
@ktdreyer FYI, it seems like there will be a follow-up after this to mark sg3_utils as an RPM "Requires".
container/Containerfile
Outdated
|
|
||
| # TODO: Is this still used? Can we get rid of this? @adk3798 do you know? | ||
| # ref: https://github.com/ceph/ceph-container/pull/1604 | ||
| # Is a ceph container ? | ||
| LABEL ceph="True" | ||
|
|
||
| # TODO: This doesn't seem to be used by rook or ceph. Can we get rid of this? @guits do you know? | ||
| # ref: https://github.com/ceph/ceph-container/pull/1969 | ||
| ENV I_AM_IN_A_CONTAINER 1 | ||
|
|
There was a problem hiding this comment.
Cephadm doesn't use these directly. It might link somehow to the CEPH_USE_RANDOM_NONCE stuff though. Cephadm's use of that was just recently removed in #50344
There was a problem hiding this comment.
In a search of the ceph repo, I didn't find any references to I_AM_IN_A_CONTAINER, but I'm not sure if there could be usages I'd miss by searching like that.
Would @jdurgin know? Or maybe know who I should be asking to find out?
There was a problem hiding this comment.
That might have been just for humans and/or custom scripting in the container. It's good to be able to query, although maybe there's some more-standard way?...
There was a problem hiding this comment.
There are new org.opencontainers standardized labels added now that should be good for querying.
There was a problem hiding this comment.
can a process running inside a container query the container labels?
13f476f to
c2dbb47
Compare
| tags: ${{ steps.container-build.outputs.tags }} | ||
| registry: quay.io/ceph/ceph-staging | ||
| username: ceph+ceph_staging_ci | ||
| password: ${{ secrets.CEPH_STAGING_QUAY_TOKEN }} |
There was a problem hiding this comment.
@guits this variable seems to be empty, causing this build-push to fail. Is this token misspelled maybe, or is it not created as an "Actions" secret in the repo?
There was a problem hiding this comment.
there's well an action secret CEPH_STAGING_QUAY_TOKEN, i've just reset its value to make sure.
c2dbb47 to
f216c47
Compare
| # CLEAN UP! | ||
| RUN set -x && \ | ||
| dnf clean all && \ | ||
| rm -rf /var/cache/dnf/* && \ | ||
| rm -rf /var/lib/dnf/* && \ | ||
| rm -f /var/lib/rpm/__db* && \ | ||
| # remove unnecessary files with big impact | ||
| rm -rf /etc/selinux /usr/share/{doc,man,selinux} && \ | ||
| # don't keep compiled python binaries | ||
| find / -xdev \( -name "*.pyc" -o -name "*.pyo" \) -delete | ||
|
|
||
| # Verify that the packages installed haven't been accidentally cleaned, then | ||
| # clean the package list and re-clean unnecessary RPM database files | ||
| RUN rpm -q $(cat packages.txt) && rm -f /var/lib/rpm/__db* && rm -f *packages.txt |
There was a problem hiding this comment.
Revising the clean up from ceph-container, I am seeing an ARM image that is 1.07GB on my machine, versus 1.25GB to 1.28GB for other quay.io/ceph images I've pulled.
Of note, the parens in the find ... -delete command fix a bug where *.pyc files weren't being deleted before. Plus removing unnecessary DNF and RPM database files.
I decided to leave bash completion files (which have changed locations apparently) in the image because they are less than 1MB in total.
b0861b2 to
b45435e
Compare
|
the squid release will be dropping support for centos 8, and there's a need a build containers based on centos 9. there's been some discussions around this in #53901 and elsewhere on slack. if we're starting from scratch here, it would be wonderful if we could start with centos 9 instead |
I agree that it would be nice, but I (or someone) would need to spend time making sure all the el9 locations are ready. Ideally quincy, reef, squid, and main all have el9 builds, and I'm not sure that's true today or not. Can we at least get this merged and then convert to el9 later? This PR is clean and good to go from my end, but I'm waiting to hear back from @guits about checking the secret so that the quay login works. [update] Just realized that this is probably not a configuration issue, but a security thing. GitHub doesn't allow secrets to be used from pull requests so that users can't "exfiltrate" secrets by making arbitrary PRs. This means that the GH action is good for builds from committed PRs, but not viable for building/publishing images related to pull requests. I guess the shaman system would be good for that, and it seems like something is already set up to build the quay.io/ceph-ci images like that. @dmick does that sound right? |
a98c31c to
3361853
Compare
|
this has an approval. does it need any further review/testing to merge? afaik nothing depends on this yet so there's little risk to merging what are the next steps? |
|
"This means that the GH action is good for builds from committed PRs, but not viable for building/publishing images related to pull requests. I guess the shaman system would be good for that, and it seems like something is already set up to build the quay.io/ceph-ci images like that. @dmick does that sound right?" I can't speak to the GH actions; I don't understand anything about the context they run in, or how one might possibly handle secrets safely. I know that things like those secrets are stored in Jenkins, which is what does the CI builds (shaman and chacra are just storage/db for the build results). I do want to point out however that we generally don't push CI images to quay.io, but rather our own quay.ceph.io (quay.io has usage limits and we do a lot of builds). |
This is only an initial implementation. It is not expected to work for any branch except for Ceph 'main'. The build action is also disabled for any pull request except ones that modify the container build files themselves to make sure that any issues do not cause confusion with ongoing or future feature PRs. This PR focuses on creating a good starting point for a "Containerfile" definition, tooling for building multi-architecture images using it, and pushing the image to a temporary repository: quay.io/ceph/ceph-staging. There are many TODO items noted in the files that should be followed up on in future PRs. Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
2d46335 to
9b5837b
Compare
9b5837b to
6c8afc2
Compare
| RUN \ | ||
| dnf install -y --setopt=install_weak_deps=False dnf-plugins-core && \ | ||
| dnf copr enable -y tchaikov/python-scikit-learn && \ | ||
| dnf copr enable -y tchaikov/python3-asyncssh |
There was a problem hiding this comment.
You can remove the tchaikov/python3-asyncssh copr here (and the corresponding mention in the comment above). The python-asyncssh package is in epel8 and epel9 now: https://src.fedoraproject.org/rpms/python-asyncssh
In fact the EPEL 9 version will have an important bug fix (rhbz#2255043 / CVE-2023-48795) soon.
Use of cephadm requires a suitable container image to be available; upstream's are based on centos. This is an attempt at a minimal set for our intended purposes (i.e. no ganesha / mds etc.) Based on upstream work in ceph/ceph#54575 which in turn is a move to simplify how upstream builds their containers. This is for the PoC ceph/reef multisite cluster, codenamed "apus". Bug: T279621 Change-Id: I256acec29b5c50eff266690e15db066470038aa7
|
I've been working on github.com/dmick/ceph-buildah which uses the buildah utility to build the containers. The huge advantage is that it allows a series of shell commands to make incremental changes to a working container, and then commits the final result, so all the conditional logic can be coded in the toplevel shell command, which is a huge help at bug diagnosis. I think I very much want to go this direction rather than a Dockerfile for the revamp. I've been building containers of centos8 and centos9 x86 and arm and having some luck; I want to do that formally and run some formal container-based tests (probably the rbd iscsi tests to catch "is the iscsi inclusion good as well), and then I think I'd like to propose integrating that into ceph.git rather than this. |
|
Red Hat's build system (OSBS) input is a single |
|
For debugging, I like to use the |
|
oh well that might be a good argument for dockerfiles I guess. crap. Maybe modern dockerfiles are less painful to do this ridiculous handstanding in. |
|
|
||
| # Other labels are set automatically by container/build github action | ||
| # See: https://github.com/opencontainers/image-spec/blob/main/annotations.md | ||
| LABEL org.opencontainers.image.authors="Guillaume Abrioux <gabrioux@redhat.com>" \ |
|
@dmick I was pointed at this conversation during the most recent infra meeting. I am also interested in the topic and am happy to discuss any challenges you are hitting wrt building images: esp. wrt "this ridiculous handstanding" might entail :-) I also am a fan of buildah, but as Ken notes it's not so widespread and more niche than Dockerfiles. |
|
Mostly it's "if somehow we end up with another 200-line continued bash statement I will descend upon the author with a mighty and terrible force" but I think multiple RUN statements and --squash are the way out of that horror. ARG and short conditional bash are probably maintainable/usable. |
lol :-)
That seems fine. I also dislike having a bunch of logic on a single RUN line. If the logic is necessary I'll often tend to farm the work out to a real script outside the {Dock,Contain}erfile. |
|
In our project we've run into several problems around CentOS Stream, causing us to look for Debian based Ceph container images. As we could not find suitable ones, we are now looking at building them ourselves and would like to use the code from this PR as base instead of the one based on Make (I arrived here through issue 2171). So I would like to ask what state this PR is in? I see some open questions / work-in-progress-notes while I do not find anything which, at least to me, looks like testing, e.g. compared with the previous target test.staging (though I only just started reading through that code, so no idea what it actually does/tests, yet). |
|
More work is needed; I don't know how far this is from final form yet. But container building isn't particularly magic, and if you end up at a slightly different place than we do (besides the obvious 'based on Debian' thing, which will involve a fair bit of package name munging as I'm sure you know), as long as you have the dependencies you need it'll probably be fine. I have a separate repo I've been working in, dmick/container.git, that might or might not be a better starting place. |
| @@ -0,0 +1,213 @@ | |||
| FROM quay.io/centos/centos:stream8 | |||
There was a problem hiding this comment.
better use :stream9 as CentOS 8 Stream is EOL at the end of this month.
|
Whats the status of this? c8 is dead now. |
I keep hitting this problem though I still have hopes we can work-around it (for our specific usecase) because we were able to install a kind-of-working cluster on bare-metal, so without any containers, using the DEB packages. Since we want to leverage cephadm, bare-metal is not our preference at all. I'm mentioning this here because, according to the comments in the linked bug report, Centos Stream 9 is also affected and with Centos Stream 8 being EOL I'm hoping someone here has found a solution/workaround to share. |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This initiative seems to have fallen asleep somewhat. For what it's worth, I made an image build based on this PR work using latest Debian Bookworm as base. It took some tinkering with Ceph sources, completely disabling the built-in mgr dashboard to work around the PyO3 multi-init error. I'm still working on improving it and, so far, testing yielded no issues. Hopefully this gives someone a push to move forward with this PR :-) |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
For those watching this PR or who stumble across it in the future, PR #59868 has been merged and this adds the necessary {Container,Docker}file and scripts to build containers from ceph/ceph repo in the existing ceph build infra. |
|
I have the amazing ability to always paste the wrong thing. PR #59868 is it. I have also updated my previous comment (but edits don't appear in email notifications, thus this comment). :-) |
This is only an initial implementation. It is not expected to work for any branch except for Ceph 'main'. The build action is also disabled for any pull request except ones that modify the container build files themselves to make sure that any issues do not cause confusion with ongoing or future feature PRs.
This PR focuses on creating a good starting point for a "Containerfile" definition, tooling for building multi-architecture images using it, and pushing the image to a temporary repository: quay.io/ceph/ceph-staging.
There are many TODO items noted in the files that should be followed up on in future PRs.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e