Skip to content

containerd integration: Make build work#44079

Merged
thaJeztah merged 2 commits intomoby:masterfrom
thaJeztah:c8d_build
Mar 7, 2023
Merged

containerd integration: Make build work#44079
thaJeztah merged 2 commits intomoby:masterfrom
thaJeztah:c8d_build

Conversation

@thaJeztah
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah commented Sep 1, 2022

Upstreaming

No conflicts, except for vendoring / vendor.sum

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah thaJeztah added area/builder Build status/2-code-review area/builder/buildkit Build containerd-integration Issues and PRs related to containerd integration labels Sep 1, 2022
@thaJeztah thaJeztah added this to the v-next milestone Sep 1, 2022
@thaJeztah thaJeztah force-pushed the c8d_build branch 2 times, most recently from 04ce54d to ddc621e Compare September 1, 2022 16:03
@thaJeztah
Copy link
Copy Markdown
Member Author

I squashed a couple of commits as some of them were fix-ups in previous commits; kept the last two separate (but possibly the "runc" one should be squashed.

@thaJeztah thaJeztah marked this pull request as ready for review September 1, 2022 16:03
@thaJeztah thaJeztah requested a review from rumpl September 6, 2022 12:46
@thaJeztah
Copy link
Copy Markdown
Member Author

@tonistiigi @rumpl PTAL (also at #44079 (comment))

@thaJeztah

This comment was marked as resolved.

Copy link
Copy Markdown
Member

@rumpl rumpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@thaJeztah
Copy link
Copy Markdown
Member Author

Ah, dang; conflict in vendor.sum; I'll rebase

@thaJeztah
Copy link
Copy Markdown
Member Author

done 👍

@thaJeztah
Copy link
Copy Markdown
Member Author

Not sure I've seen this before;


[2022-09-08T15:21:21.530Z]  > [crun 1/2] RUN --mount=type=cache,sharing=locked,id=moby-crun-aptlib,target=/var/lib/apt     --mount=type=cache,sharing=locked,id=moby-crun-aptcache,target=/var/cache/apt         apt-get update && apt-get install -y --no-install-recommends             autoconf             automake             build-essential             libcap-dev             libprotobuf-c-dev             libseccomp-dev             libsystemd-dev             libtool             libudev-dev             libyajl-dev             python3             ;:
[2022-09-08T15:21:21.530Z] #0 1.373 runc run failed: unable to start container process: error during container init: error mounting "/var/lib/docker/buildkit/executor/resolv.conf" to rootfs at "/etc/resolv.conf": possibly malicious path detected -- refusing to operate on /var/lib/docker/buildkit/executor/qj8f5s2o4ep3euuoz99hyp7a0/rootfs/etc/resolv.conf (deleted)

@thaJeztah
Copy link
Copy Markdown
Member Author

/cc @tonistiigi

@thaJeztah
Copy link
Copy Markdown
Member Author

Alright; this time it passed without the weird error; looks like a race indeed?

@tonistiigi PTAL if this PR looks good to you 🙏

@crazy-max
Copy link
Copy Markdown
Member

Alright; this time it passed without the weird error; looks like a race indeed?

@thaJeztah fyi BuildKit tests in this workflow https://github.com/moby/moby/blob/master/.github/workflows/buildkit.yml don't run with containerd snapshotter atm.

As discussed with @tonistiigi, this matrix should run both with dockerd worker and containerd integration.

Basically we should change the worker being tested to make it based on a new matrix entry and also update docker daemon config accordingly. I can do it on @rumpl fork so you can cherry pick here. LGTY?

Copy link
Copy Markdown
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks mostly good.
Also tested this out and it seems to work pretty well, even supports multi-platform output.

Logs are quite verbose after a build and looks like there's some transient error that occurs:

INFO[2022-11-01T21:53:03.707692329Z] [core] Subchannel Connectivity change to CONNECTING  module=grpc
INFO[2022-11-01T21:53:03.707771334Z] [core] Subchannel picks a new address "localhost" to connect  module=grpc
INFO[2022-11-01T21:53:03.707832438Z] [core] Channel Connectivity change to CONNECTING  module=grpc
WARN[2022-11-01T21:53:03.707856239Z] [core] grpc: addrConn.createTransport failed to connect to {localhost localhost <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing only one connection allowed"  module=grpc
INFO[2022-11-01T21:53:03.707904743Z] [core] Subchannel Connectivity change to TRANSIENT_FAILURE  module=grpc
INFO[2022-11-01T21:53:03.707934945Z] [core] Channel Connectivity change to TRANSIENT_FAILURE  module=grpc
ERRO[2022-11-01T21:53:03.708072154Z] healthcheck failed fatally                   
INFO[2022-11-01T21:53:03.708146258Z] [core] Channel Connectivity change to SHUTDOWN  module=grpc

Not sure if that's related to this change or just the containerd snapshotter use-case as a whole.

@thaJeztah thaJeztah force-pushed the c8d_build branch 2 times, most recently from b4d8b10 to e0e2066 Compare November 14, 2022 19:12
@thaJeztah
Copy link
Copy Markdown
Member Author

was to add a new worker feature (like FeatureCacheExportS3)

Yeah, for that specific case, thinking if (ideally) exporters should have a mechanism to "register" themselves, and some way to get that list. Similar to how (e.g.) graph drivers register themselves 🤔

@vvoland
Copy link
Copy Markdown
Contributor

vvoland commented Mar 2, 2023

However, I think in the end we still will need to support the s3 cache backend anyway, so adding the extra compat checks to buildkit tests is a bit wasteful work.

We already have the aws-sdk vendored it, so hopefully for now we can just vendor the s3 bits until we have a way to push these out of moby (see Brian's suggestion: #44079 (comment))

@vvoland
Copy link
Copy Markdown
Contributor

vvoland commented Mar 2, 2023

Opened a vendor PR: #45095
Also included the vendor here to re-run the tests.
Thanks, I didn't consider vendoring non-released version 😅

@vvoland
Copy link
Copy Markdown
Contributor

vvoland commented Mar 3, 2023

Rebased and included the s3 cache support. Now all buildkit tests should pass.
Warning: it vendors 88k lines of code (s3 related code from aws-sdk-go-v2). Do we want to squash it, or should I open a separate PR for it?

"inline": inlineremotecache.ResolveCacheExporterFunc(),
"local": localremotecache.ResolveCacheExporterFunc(opt.SessionManager),
"gha": gha.ResolveCacheExporterFunc(),
"s3": s3remotecache.ResolveCacheExporterFunc(),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing these two, I noticed that these are not "symmetrical";

  • ❓ is it expected that the inline cache is included in Exporters, but not in Importers ?
  • (nit): perhaps we should sort both lists have the exporters/importers in the same order (we could just sort them alphabetically, then at least there's no bike-shedding about what order they should appear in).

Screenshot 2023-03-06 at 12 24 31

Copy link
Copy Markdown
Contributor

@vvoland vvoland Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting would definitely make sense!

Regarding inline cache - according to the buildkit's docs it embeds the cache into the image and is expected to be imported with the registry cache: https://github.com/moby/buildkit#inline-push-image-and-cache-together

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was wondering if it was something like that 🤔 it was confusing me a bit though (we could even consider including a comment to describe that, if what you linked to is indeed the expected behaviour). /cc @crazy-max @jedevc (if you know)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is expected - they are assymetrical. Inline cache works fundamentally differently than the other types of cache though they expose the same interface (but we can import them in the same way).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer to keep s3/azure caching out of the main dockerd binary.

@thaJeztah
Copy link
Copy Markdown
Member Author

Rebased and included the s3 cache support. Now all buildkit tests should pass.
Warning: it vendors 88k lines of code (s3 related code from aws-sdk-go-v2). Do we want to squash it, or should I open a separate PR for it?

I'm tempted to say; open as a separate PR, as there's other caching that we may want to(?) add, but IIUC, it was currently needed to make the BuildKit CI pass (as it runs some tests that expect s3 to be supported?)

For the caching, I was wondering how authentication is handled, and it looks like they use environment-variables that are supported by the SDKs. If we add support for those (guess this would be "automatically?"), at least we need to update the documentation for the daemon to mention the environment variables it may be using;

That said, perhaps we need to look at those;

  • Do we want them to automatically look at environment variables? (such env-vars may be present for other purposes; perhaps it's ok to use them, but at least users should be aware that dockerd now may be using those (so implicitly gets access to their s3/azure/github account)
  • Do we want configuration options in daemon.json to allow users to specify (default) accounts for caching? (which may be separate from environment variables on the host)?

@thaJeztah
Copy link
Copy Markdown
Member Author

As discussed in the call;

  • We removed the s3 caching from this PR (for follow-up discussion)
  • Temporarily removed the BuildKit CI changes, because those currently fail without the s3 caching (as they run some tests that expect it to be present)

BuildKit CI was green before those patches were removed (but they allowed us to verify that everything worked)

Copy link
Copy Markdown
Member Author

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (can't approve as this PR was originally opened by me)

@cpuguy83 @neersighted @rumpl PTAL if you're okay with the last iteration)

- Only use the image exporter in build if we don't use containerd
  Without this "docker build" fails with:

    Error response from daemon: exporter "image" could not be found

- let buildx know we support containerd snapshotter
- Pass the current snapshotter to the buildkit worker

  If buildkit uses a different snapshotter we can't list the images any
  more because we can't find the snapshot.

builder/builder-next: make ContainerdWorker a minimal wrapper

Note that this makes "Worker" a public field, so technically one could
overwrite it.

builder-next: reenable runc executor

Currently, without special CNI config the builder would
only create host network containers that is a security issue.

Using runc directly instead of shim is faster as well
as builder doesn’t need anything from shim. The overhead
of setting up network sandbox is much slower of course.

builder/builder-next: simplify options handling

Trying to simplify the logic;

- Use an early return if multiple outputs are provided
- Only construct the list of tags if we're using an image (or moby) exporter
- Combine some logic for snapshotter and non-snapshotter handling

Create a constant for the moby exporter

Pass a context when creating a router

The context has a 10 seconds timeout which should be more than enough to
get the answer from containerd.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Co-authored-by: Tonis Tiigi <tonistiigi@gmail.com>
Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Co-authored-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
@thaJeztah
Copy link
Copy Markdown
Member Author

For transparency; this PR does add gha as caching backend, which also implicitly adds support in the daemon for reading GHA authentication information from env vars (as mentioned above #44079 (comment))

https://docs.docker.com/build/cache/backends/gha/#authentication
If the url or token parameters are left unspecified, the gha cache backend will fall back to using environment variables.

I think it's okay to keep that change in, but I'll open a tracking ticket for further discussion on how we want to go forward with configuring these.

@thaJeztah thaJeztah merged commit b3428bc into moby:master Mar 7, 2023
@thaJeztah thaJeztah deleted the c8d_build branch March 7, 2023 10:26
@rumpl
Copy link
Copy Markdown
Member

rumpl commented Mar 7, 2023

Yay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

9 participants