Skip to content

add initial container file and github build action#54575

Closed
BlaineEXE wants to merge 2 commits intoceph:mainfrom
BlaineEXE:add-container-file
Closed

add initial container file and github build action#54575
BlaineEXE wants to merge 2 commits intoceph:mainfrom
BlaineEXE:add-container-file

Conversation

@BlaineEXE
Copy link
Collaborator

@BlaineEXE BlaineEXE commented Nov 20, 2023

This is only an initial implementation. It is not expected to work for any branch except for Ceph 'main'. The build action is also disabled for any pull request except ones that modify the container build files themselves to make sure that any issues do not cause confusion with ongoing or future feature PRs.

This PR focuses on creating a good starting point for a "Containerfile" definition, tooling for building multi-architecture images using it, and pushing the image to a temporary repository: quay.io/ceph/ceph-staging.

There are many TODO items noted in the files that should be followed up on in future PRs.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions github-actions bot added the CI Continuous Integration label Nov 20, 2023
@BlaineEXE BlaineEXE force-pushed the add-container-file branch 7 times, most recently from 919c509 to 13f476f Compare November 20, 2023 18:45
@BlaineEXE
Copy link
Collaborator Author

BlaineEXE commented Nov 20, 2023

Note: the Containerfile is not in the root of the repo and is instead placed into ./container for some practical reasons:

  1. for developers using non-Linux systems to build the container image (i.e., MacOS users), the build context needs to copy the entire ceph repo into a VM for the build, which is unnecessary and takes a truly massive amount of time (for no benefit)
  2. it provides a space for adding scripts/makefile to help with the container build into the ./container dir, keeping any container tooling in the same place without cluttering the repo root

Comment on lines +101 to +86
# Copr repos
# TODO: afaict, scikit is for mgr-diskpredition-local, asyncssh for cephadm, but why do we
# encode these as deps only in container instead of including them as a vendored
# submodule in the ceph repo or something to be included with ceph packages?
# @tchaikov @torchiaf @adk3798 @ktdreyer?
# ref: https://github.com/ceph/ceph-container/pull/1821
# note: scikit-learn packages are not included in ubi8 in ceph-container
RUN \
dnf install -y --setopt=install_weak_deps=False dnf-plugins-core && \
dnf copr enable -y tchaikov/python-scikit-learn && \
dnf copr enable -y tchaikov/python3-asyncssh
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guits @adk3798 @torchiaf @tchaikov do any of you have any thoughts here?

Copy link
Contributor

@adk3798 adk3798 Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For asyncssh at least, the cephadm mgr module needs it. I think we couldn't find any packaging of it in official repos so Kefu had this copr repo set up. We should maybe host this under the copr of somebody more involved in the project at this point, but we still need it somewhere.

Edit - didn't read the full comment first.

I assume keeping it in a copr repo is just easier than having it as a submodule, but I didn't really look into it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for weighing in. It's not a big issue, but it seemed worth asking since I'm not sure how hard it is to set up and maintain copr repos. Regardless, it'll be something to changed later down the line if changing it makes sense.

Comment on lines +119 to +120
# General
PACKAGES="ca-certificates" && \
# Ceph
# TODO: I removed these packages that were included in ganesha packages:
# gcc, lua-devel, luarocks
# They are listed as BuildRequires in ceph.spec.in, so I assume they are unnecessary
# @mattbenjamin please advise about whether these are necessary for runtime
PACKAGES="$PACKAGES \
ceph-common${CEPH_PACKAGE_VERSION} \
ceph-exporter${CEPH_PACKAGE_VERSION} \
ceph-grafana-dashboards${CEPH_PACKAGE_VERSION} \
ceph-immutable-object-cache${CEPH_PACKAGE_VERSION} \
ceph-mds${CEPH_PACKAGE_VERSION} \
ceph-mgr-cephadm${CEPH_PACKAGE_VERSION} \
ceph-mgr-dashboard${CEPH_PACKAGE_VERSION} \
ceph-mgr-diskprediction-local${CEPH_PACKAGE_VERSION} \
ceph-mgr-k8sevents${CEPH_PACKAGE_VERSION} \
ceph-mgr-rook${CEPH_PACKAGE_VERSION} \
ceph-mgr${CEPH_PACKAGE_VERSION} \
ceph-mon${CEPH_PACKAGE_VERSION} \
ceph-osd${CEPH_PACKAGE_VERSION} \
ceph-radosgw${CEPH_PACKAGE_VERSION} \
ceph-volume${CEPH_PACKAGE_VERSION} \
cephfs-mirror${CEPH_PACKAGE_VERSION} \
cephfs-top${CEPH_PACKAGE_VERSION} \
kmod \
libradosstriper1${CEPH_PACKAGE_VERSION} \
rbd-mirror${CEPH_PACKAGE_VERSION} \
$(cat crimson-packages.txt) \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattbenjamin could you advise if gcc, lua-devel, or luarocks packages are needed in the ceph container image?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lua-devel and luarocks are runtime dependencies for radosgw, and #52931 (cc @yuvalif) is moving them from BuildRequires to Requires

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added those back in here. Left gcc removed for now.

Comment on lines +172 to +156
# Ceph-CSI
# TODO: coordinate with @Madhu-1 to have Ceph-CSI install these itself if unused by ceph
# @adk3798 does cephadm use these?
PACKAGES="$PACKAGES \
attr \
ceph-fuse${CEPH_PACKAGE_VERSION} \
rbd-nbd${CEPH_PACKAGE_VERSION} \
" && \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 @adk3798 let me know if you have thoughts around this. It seems to me like keeping the ceph-fuse and rbd-nbd drivers in the ceph image makes sense, even for non-CSI images. But if you have differing thoughts, let's discuss.

Also, @Madhu-1 is attr still needed, and if so, do you think it's broadly needed or only for Ceph-CSI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cephadm doesn't use these directly at least. Can't say with certainty for any of the daemons cephadm deploys that use the ceph container image though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving these for now.

Comment on lines +187 to +170
# TODO: Per discussion, sg3_utils should be moved to ceph.spec.in with libstoragemgmt; @ktdreyer
# ref: https://github.com/ceph/ceph-container/pull/2013#issuecomment-1248606472
PACKAGES="$PACKAGES \
gdisk \
hostname \
procps-ng \
sg3_utils \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktdreyer FYI, it seems like there will be a follow-up after this to mark sg3_utils as an RPM "Requires".

Comment on lines +25 to +34

# TODO: Is this still used? Can we get rid of this? @adk3798 do you know?
# ref: https://github.com/ceph/ceph-container/pull/1604
# Is a ceph container ?
LABEL ceph="True"

# TODO: This doesn't seem to be used by rook or ceph. Can we get rid of this? @guits do you know?
# ref: https://github.com/ceph/ceph-container/pull/1969
ENV I_AM_IN_A_CONTAINER 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adk3798 @guits do you see a need to keep these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cephadm doesn't use these directly. It might link somehow to the CEPH_USE_RANDOM_NONCE stuff though. Cephadm's use of that was just recently removed in #50344

Copy link
Collaborator Author

@BlaineEXE BlaineEXE Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a search of the ceph repo, I didn't find any references to I_AM_IN_A_CONTAINER, but I'm not sure if there could be usages I'd miss by searching like that.

Would @jdurgin know? Or maybe know who I should be asking to find out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might have been just for humans and/or custom scripting in the container. It's good to be able to query, although maybe there's some more-standard way?...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are new org.opencontainers standardized labels added now that should be good for querying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a process running inside a container query the container labels?

tags: ${{ steps.container-build.outputs.tags }}
registry: quay.io/ceph/ceph-staging
username: ceph+ceph_staging_ci
password: ${{ secrets.CEPH_STAGING_QUAY_TOKEN }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guits this variable seems to be empty, causing this build-push to fail. Is this token misspelled maybe, or is it not created as an "Actions" secret in the repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's well an action secret CEPH_STAGING_QUAY_TOKEN, i've just reset its value to make sure.

Comment on lines +213 to +203
# CLEAN UP!
RUN set -x && \
dnf clean all && \
rm -rf /var/cache/dnf/* && \
rm -rf /var/lib/dnf/* && \
rm -f /var/lib/rpm/__db* && \
# remove unnecessary files with big impact
rm -rf /etc/selinux /usr/share/{doc,man,selinux} && \
# don't keep compiled python binaries
find / -xdev \( -name "*.pyc" -o -name "*.pyo" \) -delete

# Verify that the packages installed haven't been accidentally cleaned, then
# clean the package list and re-clean unnecessary RPM database files
RUN rpm -q $(cat packages.txt) && rm -f /var/lib/rpm/__db* && rm -f *packages.txt
Copy link
Collaborator Author

@BlaineEXE BlaineEXE Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revising the clean up from ceph-container, I am seeing an ARM image that is 1.07GB on my machine, versus 1.25GB to 1.28GB for other quay.io/ceph images I've pulled.

Of note, the parens in the find ... -delete command fix a bug where *.pyc files weren't being deleted before. Plus removing unnecessary DNF and RPM database files.

I decided to leave bash completion files (which have changed locations apparently) in the image because they are less than 1MB in total.

@cbodley
Copy link
Contributor

cbodley commented Nov 21, 2023

the squid release will be dropping support for centos 8, and there's a need a build containers based on centos 9. there's been some discussions around this in #53901 and elsewhere on slack. if we're starting from scratch here, it would be wonderful if we could start with centos 9 instead

@BlaineEXE
Copy link
Collaborator Author

BlaineEXE commented Nov 27, 2023

the squid release will be dropping support for centos 8, and there's a need a build containers based on centos 9. there's been some discussions around this in #53901 and elsewhere on slack. if we're starting from scratch here, it would be wonderful if we could start with centos 9 instead

I agree that it would be nice, but I (or someone) would need to spend time making sure all the el9 locations are ready. Ideally quincy, reef, squid, and main all have el9 builds, and I'm not sure that's true today or not.

Can we at least get this merged and then convert to el9 later? This PR is clean and good to go from my end, but I'm waiting to hear back from @guits about checking the secret so that the quay login works.

#54575 (comment)

[update]

Just realized that this is probably not a configuration issue, but a security thing. GitHub doesn't allow secrets to be used from pull requests so that users can't "exfiltrate" secrets by making arbitrary PRs.
https://stackoverflow.com/questions/73866908/github-actions-main-repository-secret-not-picked-up-from-pull-request-build

This means that the GH action is good for builds from committed PRs, but not viable for building/publishing images related to pull requests. I guess the shaman system would be good for that, and it seems like something is already set up to build the quay.io/ceph-ci images like that. @dmick does that sound right?

@BlaineEXE BlaineEXE marked this pull request as ready for review November 27, 2023 21:49
@BlaineEXE BlaineEXE force-pushed the add-container-file branch 2 times, most recently from a98c31c to 3361853 Compare November 27, 2023 22:04
Copy link
Contributor

@ktdreyer ktdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not viable for building/publishing images related to pull requests

Yes, that's dangerous in general to run privileged code automatically on anyone's forked PR.

I think what you have here is fine and we should merge it.

@cbodley
Copy link
Contributor

cbodley commented Dec 8, 2023

this has an approval. does it need any further review/testing to merge? afaik nothing depends on this yet so there's little risk to merging

what are the next steps?

@cbodley cbodley mentioned this pull request Dec 8, 2023
2 tasks
@dmick
Copy link
Member

dmick commented Dec 8, 2023

"This means that the GH action is good for builds from committed PRs, but not viable for building/publishing images related to pull requests. I guess the shaman system would be good for that, and it seems like something is already set up to build the quay.io/ceph-ci images like that. @dmick does that sound right?"

I can't speak to the GH actions; I don't understand anything about the context they run in, or how one might possibly handle secrets safely. I know that things like those secrets are stored in Jenkins, which is what does the CI builds (shaman and chacra are just storage/db for the build results). I do want to point out however that we generally don't push CI images to quay.io, but rather our own quay.ceph.io (quay.io has usage limits and we do a lot of builds).

This is only an initial implementation. It is not expected to work for
any branch except for Ceph 'main'. The build action is also disabled for
any pull request except ones that modify the container build files
themselves to make sure that any issues do not cause confusion with
ongoing or future feature PRs.

This PR focuses on creating a good starting point for a "Containerfile"
definition, tooling for building multi-architecture images using it, and
pushing the image to a temporary repository: quay.io/ceph/ceph-staging.

There are many TODO items noted in the files that should be followed up
on in future PRs.

Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
@BlaineEXE BlaineEXE force-pushed the add-container-file branch 2 times, most recently from 2d46335 to 9b5837b Compare December 15, 2023 17:16
RUN \
dnf install -y --setopt=install_weak_deps=False dnf-plugins-core && \
dnf copr enable -y tchaikov/python-scikit-learn && \
dnf copr enable -y tchaikov/python3-asyncssh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the tchaikov/python3-asyncssh copr here (and the corresponding mention in the comment above). The python-asyncssh package is in epel8 and epel9 now: https://src.fedoraproject.org/rpms/python-asyncssh

In fact the EPEL 9 version will have an important bug fix (rhbz#2255043 / CVE-2023-48795) soon.

wmfgerrit pushed a commit to wikimedia/operations-docker-images-production-images that referenced this pull request Mar 21, 2024
Use of cephadm requires a suitable container image to be available;
upstream's are based on centos. This is an attempt at a minimal set
for our intended purposes (i.e. no ganesha / mds etc.)

Based on upstream work in ceph/ceph#54575
which in turn is a move to simplify how upstream builds their
containers.

This is for the PoC ceph/reef multisite cluster, codenamed "apus".

Bug: T279621
Change-Id: I256acec29b5c50eff266690e15db066470038aa7
@dmick
Copy link
Member

dmick commented Mar 21, 2024

I've been working on github.com/dmick/ceph-buildah which uses the buildah utility to build the containers. The huge advantage is that it allows a series of shell commands to make incremental changes to a working container, and then commits the final result, so all the conditional logic can be coded in the toplevel shell command, which is a huge help at bug diagnosis. I think I very much want to go this direction rather than a Dockerfile for the revamp. I've been building containers of centos8 and centos9 x86 and arm and having some luck; I want to do that formally and run some formal container-based tests (probably the rbd iscsi tests to catch "is the iscsi inclusion good as well), and then I think I'd like to propose integrating that into ceph.git rather than this.

@ktdreyer
Copy link
Contributor

Red Hat's build system (OSBS) input is a single Dockerfile per container, and I think a Dockerfile is generally the input format for other build systems (like Tekton). I think a Dockerfile is going to be easier to maintain.

@ktdreyer
Copy link
Contributor

For debugging, I like to use the podman build --force-rm=false option to avoid deleting intermediate stages. (it leaves a lot behind, so podman image prune cleans up.)

@dmick
Copy link
Member

dmick commented Mar 28, 2024

oh well that might be a good argument for dockerfiles I guess. crap. Maybe modern dockerfiles are less painful to do this ridiculous handstanding in.


# Other labels are set automatically by container/build github action
# See: https://github.com/opencontainers/image-spec/blob/main/annotations.md
LABEL org.opencontainers.image.authors="Guillaume Abrioux <gabrioux@redhat.com>" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guits are you the maintainer of this image?

@phlogistonjohn
Copy link
Contributor

@dmick I was pointed at this conversation during the most recent infra meeting. I am also interested in the topic and am happy to discuss any challenges you are hitting wrt building images: esp. wrt "this ridiculous handstanding" might entail :-)

I also am a fan of buildah, but as Ken notes it's not so widespread and more niche than Dockerfiles.

@dmick
Copy link
Member

dmick commented Apr 4, 2024

Mostly it's "if somehow we end up with another 200-line continued bash statement I will descend upon the author with a mighty and terrible force" but I think multiple RUN statements and --squash are the way out of that horror. ARG and short conditional bash are probably maintainable/usable.

@phlogistonjohn
Copy link
Contributor

Mostly it's "if somehow we end up with another 200-line continued bash statement I will descend upon the author with a mighty and terrible force"

lol :-)

but I think multiple RUN statements and --squash are the way out of that horror. ARG and short conditional bash are probably maintainable/usable.

That seems fine. I also dislike having a bunch of logic on a single RUN line. If the logic is necessary I'll often tend to farm the work out to a real script outside the {Dock,Contain}erfile.

@teckert
Copy link

teckert commented May 6, 2024

In our project we've run into several problems around CentOS Stream, causing us to look for Debian based Ceph container images. As we could not find suitable ones, we are now looking at building them ourselves and would like to use the code from this PR as base instead of the one based on Make (I arrived here through issue 2171).

So I would like to ask what state this PR is in? I see some open questions / work-in-progress-notes while I do not find anything which, at least to me, looks like testing, e.g. compared with the previous target test.staging (though I only just started reading through that code, so no idea what it actually does/tests, yet).

@dmick
Copy link
Member

dmick commented May 6, 2024

More work is needed; I don't know how far this is from final form yet. But container building isn't particularly magic, and if you end up at a slightly different place than we do (besides the obvious 'based on Debian' thing, which will involve a fair bit of package name munging as I'm sure you know), as long as you have the dependencies you need it'll probably be fine. I have a separate repo I've been working in, dmick/container.git, that might or might not be a better starting place.

@@ -0,0 +1,213 @@
FROM quay.io/centos/centos:stream8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use :stream9 as CentOS 8 Stream is EOL at the end of this month.

@kfox1111
Copy link

kfox1111 commented Jul 2, 2024

Whats the status of this? c8 is dead now.

@teckert
Copy link

teckert commented Jul 17, 2024

More work is needed; I don't know how far this is from final form yet. But container building isn't particularly magic, and if you end up at a slightly different place than we do (besides the obvious 'based on Debian' thing, which will involve a fair bit of package name munging as I'm sure you know), as long as you have the dependencies you need it'll probably be fine. I have a separate repo I've been working in, dmick/container.git, that might or might not be a better starting place.

I keep hitting this problem though I still have hopes we can work-around it (for our specific usecase) because we were able to install a kind-of-working cluster on bare-metal, so without any containers, using the DEB packages. Since we want to leverage cephadm, bare-metal is not our preference at all.

I'm mentioning this here because, according to the comments in the linked bug report, Centos Stream 9 is also affected and with Centos Stream 8 being EOL I'm hoping someone here has found a solution/workaround to share.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Sep 15, 2024
@teckert
Copy link

teckert commented Sep 18, 2024

This initiative seems to have fallen asleep somewhat. For what it's worth, I made an image build based on this PR work using latest Debian Bookworm as base. It took some tinkering with Ceph sources, completely disabling the built-in mgr dashboard to work around the PyO3 multi-init error. I'm still working on improving it and, so far, testing yielded no issues.

Hopefully this gives someone a push to move forward with this PR :-)

@github-actions
Copy link

github-actions bot commented Oct 5, 2024

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@phlogistonjohn
Copy link
Contributor

phlogistonjohn commented Oct 7, 2024

For those watching this PR or who stumble across it in the future, PR #59868 has been merged and this adds the necessary {Container,Docker}file and scripts to build containers from ceph/ceph repo in the existing ceph build infra.
@BlaineEXE may want to close this if that appears sufficient, otherwise I expect the bot will close this after some time of inactivity.

@phlogistonjohn
Copy link
Contributor

I have the amazing ability to always paste the wrong thing. PR #59868 is it. I have also updated my previous comment (but edits don't appear in email notifications, thus this comment). :-)

@BlaineEXE BlaineEXE closed this Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Continuous Integration needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants