Update runc to 1.2.0 by kolyshkin · Pull Request #48160 · moby/moby

kolyshkin · 2024-07-11T23:36:36Z

Let's use latest and greatest runc v1.2.0 (both the binary and the set of vendored libraries).

cyphar · 2024-07-14T01:35:29Z

@tonistiigi It only overrides CC if CC wasn't already set for the purposes of cross-compilation. It seems Docker uses xx-go --wrap to hide that it is cross-compiling when building dependencies, which leads to issues. I would expect CC=xx-clang (or CC=clang, because of xx-go --wrap) to fix the issue, though I haven't tested it (it's possible there are issues with the linker too).

Unfortunately, just removing cc_platform.mk will mean that "standard" cross-compilation (GOARCH=foo make) on regular systems won't work anymore (to be fair, the script probably only works on openSUSE and Debian).

tonistiigi · 2024-07-14T18:12:16Z

@cyphar xx-go does not set CC for the system (it doesn't require user to set it). It configures go env CC instead. And if you run xx-go --wrap it sets the go/cgo configuration in the go env GOENV where go reads it. GOARCH etc is also loaded through go env. I think the default for projects should still be to use the configured toolchain of the system. Maybe if you put something like ifeq ($(origin GOARCH), command line) in Makefile, then it is clear that the config was passed to the makefile specifically and has the extra behavior of modifying the toolchain.

kolyshkin · 2024-07-15T21:44:32Z

The issue seems to be that

xx-go --wrap writes ~/.config/go/env file which sets various variables for Go, including GOARCH, CC, GOGCCFLAGS etc. For example, for s390x those are:

GOARCH='s390x'
GCCGO='gccgo'
AR='s390x-linux-gnu-ar'
CC='s390x-linux-gnu-gcc'
PKG_CONFIG='s390x-linux-gnu-pkg-config'
GOGCCFLAGS='-fPIC -m64 -march=z196 -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1204069782=/tmp/go-build -gno-record-gcc-switches'

runc's cc_platform.mk exports CC=gcc. Apparently a local environment has a preference over what's in GOENV file.

Now, gcc tries to use x86_64 gcc with s390x cflags.

tonistiigi · 2024-07-15T22:22:44Z

@kolyshkin I don't think xx writes GOGCCFLAGS https://github.com/tonistiigi/xx/blob/master/src/xx-go#L104-L128 . The rest of it seems correct .

cyphar · 2024-07-15T23:44:22Z

Ideally we would just do GOARCH=xxx go env CC to get the compiler to use, but that doesn't work at all outside of xx-go --wrap:

For proper target triples, Go doesn't autodetect anything and you need to specify a CC and so on anyway. This does get copied to go env CC but that's just passthrough and there's no change in behaviour.
For GOARCH=386 (and similar -m compilation targets) Go doesn't even provide the right CFLAGS for you. They are in GOGCCFLAGS but those contain other bits that we don't need.

So we need to parse GOARCH for at least case (2). The GOARCH=... make thing is nice but it's not really required given that GOARCH=... go build doesn't work out of the box AFAICS, so I would accept us breaking this if it meant we could remove.

I suspect Docker doesn't build for 386 but the issue still stands -- we don't want to build runc with an embedded runc-dmz that is the wrong architecture. It would be great if there was a supported way to get the compiler options Go wants to use for CGo... We could just use that then.

kolyshkin · 2024-07-16T02:01:25Z

Modified this to just set CC and ~~CFLAGS~~ STRIP explicitly from xx-info, this way runc's cc_platform.mk doesn't do any guessing, and no changes to runc are needed.

kolyshkin · 2024-07-19T00:15:40Z

Full(er) list of test failures:

TestBuildUserNamespaceValidateCapabilitiesAreV2
TestNetworkNat
TestNetworkLocalhostTCPNat
TestBuildSquashParent
TestDiff
TestUsernsCommit
TestReadPluginNoRead/explicitly_enabled_caching

Most failures are related to not getting output from a container run. As I can't repro this locally I'm not sure how to bisect it.

rata · 2024-07-22T13:34:37Z

Same here, if I run things locally (changing the setting so the container uses the iptables uses the nft backend, as that is what my host is using) those tests work fine here too.

However, it is 100% reliable that runc 1.2.x fails here on CI and 1.1.x works just fine.

@thaJeztah do you have any pointers on how to further debug this issue we observe only on CI?

I can try to create a VM with ubuntu, but I hope there is a better way (I don't have any VM setup ready to go now).

kolyshkin · 2024-07-29T23:18:03Z

I can try to create a VM with ubuntu

That's exactly what I was doing (Ubuntu 20.04 in vagrant libvirt) and I still can't repro.

I suspect there are either some limits in CI that we hit, or maybe something like apparmor config.

kolyshkin · 2024-07-30T02:28:38Z

The common part between most (or all?) of the failures is not getting the container's output.

Also, I tried again and failed to repro this locally in a Ubuntu 20.04 VM.

AkihiroSuda · 2024-08-14T11:33:47Z

I suspect there are either some limits in CI that we hit, or maybe something like apparmor config.

dmesg may show some hints?
Unloading AppArmor may make some difference?

rata · 2024-08-19T14:43:28Z

@AkihiroSuda In my tests, apparmor didn't make the difference.
I booted the local host without apparmor=0 in the cmdline, the tests that fail in CI, locally pass just fine.
I've also tried CI with this commit to disable apparmor: rata@9ce9e44 and it failed in the same way (except docker-py, that seems to fail in python while connecting to some URL now).

I've also tried to create a VM in Azure (as github actions runs in Azure IIRC) just in case there was some special ubuntu setup in Azure, however those tests pass fine.

The only change I had to do in the Azure VM and locally for some test to not fail is to use the iptables nft backend: rata@f97ef77.

As I was saying in other issues, the failure in CI seems real, though. It seems 100% reliable that the updated runc fails the moby tests and the failures seem to be always the same. Using runc 1.1 the moby CI seems to pass reliably.

I'm thinking the secret might be either trying to check dmesg on CI (my commits for that didn't work, nothing is printed) or in the .github folder with how it is set-up.

@AkihiroSuda @thaJeztah @kolyshkin any other ideas to try?

AkihiroSuda · 2024-08-20T04:03:18Z

any other ideas to try?

Maybe bisect (manually? 😞 )?

AkihiroSuda · 2024-08-20T04:05:13Z

Possible reasons:

ENOSPC
ENOSPC, but for inodes
OOM

AkihiroSuda · 2024-08-23T15:40:46Z

I'm trying to bisect the commits in #48366 but it is quite painful due to compilation failures of dmz and other irrelevant CI failures

kolyshkin · 2024-08-29T05:51:46Z

Just found out something I was doing back in 2018 (commit ad2f88d) and decided to try it here.

Cherry-picked several WIP commits from https://github.com/moby/moby/commits/b0a592798f4d9d7162f8aedca89ada3a29d60e2c/ Originally-authored-by: Rodrigo Campos <rodrigoca@microsoft.com> Co-Authored-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

@kolyshkin

Release notes: final: https://github.com/opencontainers/runc/releases/tag/v1.2.0 rc.3: https://github.com/opencontainers/runc/releases/tag/v1.2.0-rc.3 rc.2: https://github.com/opencontainers/runc/releases/tag/v1.2.0-rc.2 rc.1: https://github.com/opencontainers/runc/releases/tag/v1.2.0-rc.1 Breaking changes and deprecations are included below; Breaking changes: Several aspects of how mount options work has been adjusted in a way that could theoretically break users that have very strange mount option strings. This was necessary to fix glaring issues in how mount options were being treated. The key changes are: - Mount options on bind-mounts that clear a mount flag are now always applied. Previously, if a user requested a bind-mount with only clearing options (such as rw,exec,dev) the options would be ignored and the original bind-mount options would be set. Unfortunately this also means that container configurations which specified only clearing mount options will now actually get what they asked for, which could break existing containers (though it seems unlikely that a user who requested a specific mount option would consider it "broken" to get the mount options they asked foruser who requested a specific mount option would consider it "broken" to get the mount options they asked for). This also allows us to silently add locked mount flags the user did not explicitly request to be cleared in rootless mode, allowing for easier use of bind-mounts for rootless containers. - Container configurations using bind-mounts with superblock mount flags (i.e. filesystem-specific mount flags, referred to as "data" in mount(2), as opposed to VFS generic mount flags like MS_NODEV) will now return an error. This is because superblock mount flags will also affect the host mount (as the superblock is shared when bind-mounting), which is obviously not acceptable. Previously, these flags were silently ignored so this change simply tells users that runc cannot fulfil their request rather than just ignoring it. Deprecated - runc option --criu is now ignored (with a warning), and the option will be removed entirely in a future release. Users who need a non-standard criu binary should rely on the standard way of looking up binaries in $PATH. - runc kill option -a is now deprecated. Previously, it had to be specified to kill a container (with SIGKILL) which does not have its own private PID namespace (so that runc would send SIGKILL to all processes). Now, this is done automatically. - github.com/opencontainers/runc/libcontainer/user is now deprecated, please use github.com/moby/sys/user instead. It will be removed in a future release. [@kolyshkin: set CC and CFLAGS, add xx-verify for runc-dmz, use EXTRA_BUILDTAGS (no need to specify default tags like seccomp)]. Co-authored-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Note: this usage comes from commit 56f77d5 (part of PR 23430). cgroups.InitCgroupPath is removed from runc (see [1]), and it is suggested that users use OwnCgroupPath instead, because using init's is problematic when in host PID namespace (see [2]) and is generally not the right thing to do (see [3]). [1]: opencontainers/runc@fd5debf3 [2]: opencontainers/runc@2b28b3c2 [3]: opencontainers/runc@54e20217 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2024-10-23T05:06:49Z

A failure in test / test (graphdriver) / unit (firewalld):

=== Failed
=== FAIL: libnetwork/networkdb TestNetworkDBCRUDTableEntries (7.60s)
    networkdb_test.go:313: Entry existence verification test failed for node2(8ba4809e3fe4)

Looks like it's a known issue: #47553

rata

@kolyshkin thanks for moving this forward!

rata · 2024-10-23T10:31:47Z

integration/container/nat_test.go

+	try := 0
+retry:
 	conn, err := net.Dial("tcp", net.JoinHostPort(endpoint.String(), "8080"))
+	if err != nil && try < 10 {
+		try++
+		time.Sleep(200 * time.Millisecond)
+		goto retry
+	}


I think these changes didn't help. I ended up doing this: rata@2b723e7

The bug linked there says these tests are flaky (not just the one currently skipped, IIRC) and I just added the skip like other tests on this file do. You can verify this, but my memory is that at least not all the retries in this file helped to make the tests green

@kolyshkin sorry, with "these changes" I meant just the retry here in this file. The other changes are still needed, or at least I've seen them flaky with runc 1.2.0 final release in a quick test.

The only difference is that with runc 1.2.0-rcX the tests failed reliably. Now with the overlayfs it is faster and they are just flaky. But I think the rest of the changes to moby tests are needed

This comment was marked as outdated.

Sign in to view

rata mentioned this pull request Jul 12, 2024

Makefile: Fix cross-compile in moby when using runc_nodmz opencontainers/runc#4345

Closed

kolyshkin force-pushed the update_runc_1.2.0 branch from fed7477 to c8c8db8 Compare July 13, 2024 17:08

This comment was marked as outdated.

Sign in to view

kolyshkin mentioned this pull request Jul 13, 2024

Dockerfile: update to runc v1.2.2 #47666

Merged

This comment was marked as outdated.

Sign in to view

This was referenced Jul 15, 2024

cc_platform.mk: only use when GOARCH is set explicitly opencontainers/runc#4349

Closed

Don't set cross-compilation variables in Makefile opencontainers/runc#4346

Closed

kolyshkin force-pushed the update_runc_1.2.0 branch 2 times, most recently from 56a0109 to d60f33d Compare July 16, 2024 00:44

kolyshkin mentioned this pull request Jul 16, 2024

Blockers for v1.2.0 opencontainers/runc#4114

Closed

21 tasks

rata mentioned this pull request Aug 19, 2024

[DNM] Test CI with runc 1.2 #48336

Closed

AkihiroSuda mentioned this pull request Aug 23, 2024

runc: git bisect start v1.2.0-rc.2 v1.1.0 (manually) #48366

Closed

kolyshkin force-pushed the update_runc_1.2.0 branch from d60f33d to 362f5fa Compare August 29, 2024 05:48

kolyshkin force-pushed the update_runc_1.2.0 branch from 362f5fa to c836c60 Compare August 29, 2024 15:14

kolyshkin force-pushed the update_runc_1.2.0 branch from c836c60 to ecb4e6f Compare October 22, 2024 21:02

kolyshkin changed the title ~~[test] Update runc 1.2.0~~ Update runc to 1.2.0 Oct 22, 2024

kolyshkin force-pushed the update_runc_1.2.0 branch 2 times, most recently from 47719fc to 3129247 Compare October 23, 2024 01:01

AkihiroSuda and others added 4 commits October 22, 2024 21:47

vendor: bump runc to v1.2.0

3b38721

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin force-pushed the update_runc_1.2.0 branch from 3129247 to 3b38721 Compare October 23, 2024 04:48

kolyshkin mentioned this pull request Oct 23, 2024

integration: add wait #48716

Closed

kolyshkin mentioned this pull request Oct 23, 2024

daemon: use OwnCgroupPath in withCgroups #48730

Merged

rata reviewed Oct 23, 2024

View reviewed changes

This was referenced Nov 22, 2024

[25.0 backport] Dockerfile: update containerd v1.7.24, runc v1.2.2 #48921

Merged

Fix lease management during pull and export #48910

Merged

integration: add wait #48940

Merged

Conversation

kolyshkin commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

cyphar commented Jul 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonistiigi commented Jul 14, 2024

Uh oh!

kolyshkin commented Jul 15, 2024

Uh oh!

tonistiigi commented Jul 15, 2024

Uh oh!

cyphar commented Jul 15, 2024

Uh oh!

kolyshkin commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kolyshkin commented Jul 19, 2024

Uh oh!

rata commented Jul 22, 2024

Uh oh!

kolyshkin commented Jul 29, 2024

Uh oh!

kolyshkin commented Jul 30, 2024

Uh oh!

AkihiroSuda commented Aug 14, 2024

Uh oh!

rata commented Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AkihiroSuda commented Aug 20, 2024

Uh oh!

AkihiroSuda commented Aug 20, 2024

Uh oh!

AkihiroSuda commented Aug 23, 2024

Uh oh!

kolyshkin commented Aug 29, 2024

Uh oh!

kolyshkin commented Oct 23, 2024

Uh oh!

rata left a comment

Choose a reason for hiding this comment

Uh oh!

rata Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rata Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kolyshkin commented Jul 11, 2024 •

edited

Loading

cyphar commented Jul 14, 2024 •

edited

Loading

kolyshkin commented Jul 16, 2024 •

edited

Loading

rata commented Aug 19, 2024 •

edited

Loading

rata Oct 23, 2024 •

edited

Loading