containerd-shim-runc-v2: avoid potential deadlock in create handler #9104

cyyzero · 2023-09-16T22:17:33Z

After pr #8617, create handler of containerd-shim-runc-v2 will
call handleStarted() to record the init process and handle its exit.
Although init process wouldn't quit so early in normal circumstances.
But if this screnario occurs, handleStarted() will call
handleProcessExit(), which will cause deadlock because create() had
acquired s.mu, and handleProcessExit() will try to lock it again.

So I add handleProcessExitNoLock() function which will not lock s.mu.
It can safely be called in create handler without deadlock.

Fix: #9103

k8s-ci-robot · 2023-09-16T22:17:42Z

Hi @cyyzero. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cyyzero · 2023-09-17T06:14:53Z

cc @corhere @laurazard @crosbymichael

laurazard · 2023-09-17T15:03:26Z

Firstly, pre-apologies for maybe misunderstanding the issue since I'm on leave and might be taking too light a look at this, but it strikes me that even though

handleProcessExit() won't access s.containers

doesn't necessarily mean that it's safe to remove that lock. Somewhere else that locks on s.mu is

containerd/runtime/v2/runc/task/service.go

Lines 560 to 574 in 31b6cdf

    
           func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (*ptypes.Empty, error) { 
        
           	s.mu.Lock() 
        
           	defer s.mu.Unlock() 
        
           	// return out if the shim is still servicing containers 
        
           	if len(s.containers) > 0 { 
        
           		return empty, nil 
        
           	} 
        
           	// please make sure that temporary resource has been cleanup or registered 
        
           	// for cleanup before calling shutdown 
        
           	s.shutdown.Shutdown() 
        
           	return empty, nil 
        
           }

which I'm not sure we want happening concurrently with handleProcessExit since we want to ensure all children are killed and events are sent before then. (There might be other places, I haven't checked).

@corhere you wanna TAL at this when you get a chance?

cyyzero · 2023-09-17T16:00:57Z

@laurazard Thanks for your comment!

which I'm not sure we want happening concurrently with handleProcessExit since we want to ensure all children are killed and events are sent before then. (There might be other places, I haven't checked).

In my opinion, mutex cannot ensure the sequence that all children are killed and events are sent before shutdown. Other synchronization are needed to ensure this, such as waiting for channel or conditional variables.

If handleProcessExit() must acquire s.mu, we can provided two functions like handleProcessExit() and handleProcessExitLockless(), which will be used in create handler and start handler.

fuweid · 2023-09-20T07:31:11Z

@cyyzero would you mind to use https://github.com/containerd/containerd/pull/9121/files to verify your change? thanks

$ # run the following command in containerd repo
$ DISABLE_CRI_SANDBOXES=1 EXTRA_TESTFLAGS="-test.count=1 -test.run=TestIssue9103" BUILDTAGS=no_btrfs CONTAINERD_RUNTIME=io.containerd.runc.v2 make integration

cc @laurazard

runtime/v2/runc/task/service.go

cyyzero · 2023-09-24T16:16:09Z

@fuweid Thanks for the test. I have verified and the output is as follows:

+ integration
INFO[0000] Using the following image list: {Alpine:ghcr.io/containerd/alpine:3.14.0 BusyBox:ghcr.io/containerd/busybox:1.36 Pause:registry.k8s.io/pause:3.9 ResourceConsumer:registry.k8s.io/e2e-test-images/resource-consumer:1.10 VolumeCopyUp:ghcr.io/containerd/volume-copy-up:2.2 VolumeOwnership:ghcr.io/containerd/volume-ownership:2.1 ArgsEscaped:cplatpublic.azurecr.io/args-escaped-test-image-ns:1.0} 
INFO[0000] running tests against containerd              revision=9b099f2dca75a2c225167323ea587f7fbf64c159 runtime= snapshotter= version=9b099f2dc
INFO[0000] start to pull seed image                      image="ghcr.io/containerd/busybox:1.36"
=== RUN   TestIssue9103
    log_hook.go:47: time="2023-09-24T23:18:08.876206853+08:00" level=debug msg="remote introspection plugin filters" func="introspection.(*introspectionRemote).Plugins" file="/home/cyyzero/workspace/software/containerd-test/services/introspection/introspection.go:46" filters="[type==io.containerd.snapshotter.v1, id==overlayfs]" testcase=TestIssue9103
--- PASS: TestIssue9103 (3.26s)
PASS
containerd github.com/containerd/containerd 9b099f2dc 9b099f2dca75a2c225167323ea587f7fbf64c159
time="2023-09-24T23:18:06.483387185+08:00" level=info msg="starting containerd" revision=9b099f2dca75a2c225167323ea587f7fbf64c159 version=9b099f2dc
time="2023-09-24T23:18:06.485907253+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.486107268+08:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs (ext4) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.486142244+08:00" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
time="2023-09-24T23:18:06.486159606+08:00" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
time="2023-09-24T23:18:06.486198730+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.blockfile\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.486234767+08:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.blockfile\"..." error="no scratch file generator: skip plugin" type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.486261698+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.486613998+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.488034631+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.488076370+08:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." error="devmapper not configured: skip plugin" type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.488083373+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.488101887+08:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="failed to resolve symlink for \"/var/lib/containerd/io.containerd.snapshotter.v1.zfs\": lstat /var/lib/containerd/io.containerd.snapshotter.v1.zfs: no such file or directory: skip plugin" type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.488115182+08:00" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2023-09-24T23:18:06.488135100+08:00" level=info msg="metadata content store policy set" policy=shared
time="2023-09-24T23:18:06.488376432+08:00" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
time="2023-09-24T23:18:06.488409053+08:00" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
time="2023-09-24T23:18:06.488431325+08:00" level=info msg="loading plugin \"io.containerd.lease.v1.manager\"..." type=io.containerd.lease.v1
time="2023-09-24T23:18:06.488458306+08:00" level=info msg="loading plugin \"io.containerd.nri.v1.nri\"..." type=io.containerd.nri.v1
time="2023-09-24T23:18:06.488472713+08:00" level=info msg="NRI interface is disabled by configuration."
time="2023-09-24T23:18:06.488477962+08:00" level=info msg="loading plugin \"io.containerd.image-verifier.v1.bindir\"..." type=io.containerd.image-verifier.v1
time="2023-09-24T23:18:06.488489534+08:00" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
time="2023-09-24T23:18:06.488541472+08:00" level=info msg="loading plugin \"io.containerd.runtime.v2.shim\"..." type=io.containerd.runtime.v2
time="2023-09-24T23:18:06.488570706+08:00" level=info msg="loading plugin \"io.containerd.sandbox.store.v1.local\"..." type=io.containerd.sandbox.store.v1
time="2023-09-24T23:18:06.488619498+08:00" level=info msg="loading plugin \"io.containerd.sandbox.controller.v1.local\"..." type=io.containerd.sandbox.controller.v1
time="2023-09-24T23:18:06.488656968+08:00" level=info msg="loading plugin \"io.containerd.streaming.v1.manager\"..." type=io.containerd.streaming.v1
time="2023-09-24T23:18:06.488680222+08:00" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488717812+08:00" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488728672+08:00" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488734433+08:00" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488740465+08:00" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488745694+08:00" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488750664+08:00" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488763598+08:00" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
time="2023-09-24T23:18:06.489340034+08:00" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.489387072+08:00" level=debug msg="No blockio config file specified, blockio not configured"
time="2023-09-24T23:18:06.489395067+08:00" level=debug msg="No RDT config file specified, RDT not configured"
time="2023-09-24T23:18:06.489404034+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489415555+08:00" level=info msg="loading plugin \"io.containerd.transfer.v1.local\"..." type=io.containerd.transfer.v1
time="2023-09-24T23:18:06.489464206+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2023-09-24T23:18:06.489554175+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489575715+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489600091+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489623675+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489630258+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489639465+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489665033+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489679821+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489723212+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
time="2023-09-24T23:18:06.489850821+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.sandbox-controllers\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489891648+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.sandboxes\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489910293+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489926122+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.streaming\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489961879+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489997586+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.transfer\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.490009328+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.490021170+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.cri\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.490117651+08:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc Runtimes:map[runc:{Type:io.containerd.runc.v2 Path: PodAnnotations:[] ContainerAnnotations:[] Options:map[BinaryName: CriuImagePath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false Root: ShimCgroup:] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:podsandbox}] DisableSnapshotAnnotations:true DiscardUnpackedLayers:false IgnoreBlockIONotEnabledErrors:false IgnoreRdtNotEnabledErrors:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginSetupSerially:false NetworkPluginConfTemplate: IPPreference:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:registry.k8s.io/pause:3.9 StatsCollectPeriod:10 EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true DeviceOwnershipFromSecurityContext:false IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false EnableUnprivilegedPorts:false EnableUnprivilegedICMP:false EnableCDI:false CDISpecDirs:[/etc/cdi /var/run/cdi] ImagePullProgressTimeout:1m0s DrainExecSyncIOTimeout:0s} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd-test/io.containerd.grpc.v1.cri}"
time="2023-09-24T23:18:06.490177083+08:00" level=info msg="Connect containerd service"
time="2023-09-24T23:18:06.490211167+08:00" level=info msg="using legacy CRI server"
time="2023-09-24T23:18:06.490233669+08:00" level=info msg="using experimental NRI integration - disable nri plugin to prevent this"
time="2023-09-24T23:18:06.490351830+08:00" level=info msg="Get image filesystem path \"/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs\""
time="2023-09-24T23:18:06.491110930+08:00" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
time="2023-09-24T23:18:06.491173958+08:00" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
time="2023-09-24T23:18:06.491258035+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
time="2023-09-24T23:18:06.491276404+08:00" level=info msg="Start subscribing containerd event"
time="2023-09-24T23:18:06.491576476+08:00" level=info msg="Start recovering state"
time="2023-09-24T23:18:06.491630714+08:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
time="2023-09-24T23:18:06.491653511+08:00" level=info msg="Start event monitor"
time="2023-09-24T23:18:06.491685867+08:00" level=info msg=serving... address=/run/containerd/containerd.sock
time="2023-09-24T23:18:06.491687715+08:00" level=info msg="Start snapshots syncer"
time="2023-09-24T23:18:06.491730601+08:00" level=info msg="Start cni network conf syncer for default"
time="2023-09-24T23:18:06.491734879+08:00" level=info msg="Start streaming server"
time="2023-09-24T23:18:06.491756590+08:00" level=debug msg="sd notification" notified=false state="READY=1"
time="2023-09-24T23:18:06.491779854+08:00" level=info msg="containerd successfully booted in 0.008757s"
time="2023-09-24T23:18:06.593333740+08:00" level=debug msg="garbage collected" d=4.593622ms
time="2023-09-24T23:18:06.980385290+08:00" level=debug msg="event published" ns=testing topic=/namespaces/update type=containerd.events.NamespaceUpdate
time="2023-09-24T23:18:08.803267678+08:00" level=debug msg="(*service).Write started" expected="sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c" ref="index-sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c" total=2295
time="2023-09-24T23:18:08.821995286+08:00" level=debug msg="(*service).Write started" expected="sha256:907ca53d7e2947e849b839b1cd258c98fd3916c60f2e6e70c30edbf741ab6754" ref="manifest-sha256:907ca53d7e2947e849b839b1cd258c98fd3916c60f2e6e70c30edbf741ab6754" total=528
time="2023-09-24T23:18:08.837254340+08:00" level=debug msg="(*service).Write started" expected="sha256:66ba00ad3de8677a3fa4bc4ea0fc46ebca0f14db46ca365e7f60833068dd0148" ref="config-sha256:66ba00ad3de8677a3fa4bc4ea0fc46ebca0f14db46ca365e7f60833068dd0148" total=1457
time="2023-09-24T23:18:08.848513412+08:00" level=debug msg="stat snapshot" key="sha256:b64792c17e4ad443d16b218afb3a8f5d03ca0f4ec49b11c1a7aebe17f6c3c1d2"
time="2023-09-24T23:18:08.854213584+08:00" level=debug msg="create image" name="ghcr.io/containerd/busybox:1.36" target="sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c"
time="2023-09-24T23:18:08.858224642+08:00" level=debug msg="event published" ns=testing topic=/images/update type=containerd.services.images.v1.ImageUpdate
time="2023-09-24T23:18:08.858361870+08:00" level=debug msg="Received containerd event timestamp - 2023-09-24 15:18:08.858219653 +0000 UTC, namespace - \"testing\", topic - \"/images/update\""
time="2023-09-24T23:18:08.858376447+08:00" level=debug msg="Ignoring events in namespace - \"testing\""
time="2023-09-24T23:18:08.877886822+08:00" level=debug msg="prepare snapshot" key=TestIssue9103 parent="sha256:b64792c17e4ad443d16b218afb3a8f5d03ca0f4ec49b11c1a7aebe17f6c3c1d2"
time="2023-09-24T23:18:08.888991414+08:00" level=debug msg="event published" ns=testing topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
time="2023-09-24T23:18:08.893618664+08:00" level=debug msg="get snapshot mounts" key=TestIssue9103
time="2023-09-24T23:18:08.902182818+08:00" level=debug msg="event published" ns=testing topic=/containers/create type=containerd.events.ContainerCreate
time="2023-09-24T23:18:08.907228321+08:00" level=debug msg="get snapshot mounts" key=TestIssue9103
time="2023-09-24T23:18:08.914998242+08:00" level=debug msg="shim bootstrap parameters" address="unix:///run/containerd/s/b60b5f01d31b4d44c5871f43149981a7412434bdbc265d76acfbc41f8b739fcb" namespace=testing protocol=ttrpc
time="2023-09-24T23:18:08.917076983+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2023-09-24T23:18:08.917405806+08:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.pause\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2023-09-24T23:18:08.917417958+08:00" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.pause runtime=io.containerd.runc.v2
time="2023-09-24T23:18:08.917440180+08:00" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2023-09-24T23:18:08.917463895+08:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2023-09-24T23:18:08.917528135+08:00" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.task runtime=io.containerd.runc.v2
time="2023-09-24T23:18:08.917621971+08:00" level=debug msg="serving api on socket" socket="[inherited from parent]"
time="2023-09-24T23:18:08.917651647+08:00" level=debug msg="starting signal loop" namespace=testing path=/run/containerd-test/io.containerd.runtime.v2.task/testing/TestIssue9103 pid=31963 runtime=io.containerd.runc.v2
time="2023-09-24T23:18:09.077660531+08:00" level=debug msg="garbage collected" d=2.589245ms
2023/09/24 23:18:11 traces export: Post "https://localhost:4318/v1/traces": dial tcp 127.0.0.1:4318: connect: connection refused
time="2023-09-24T23:18:12.031835788+08:00" level=error msg="loading cgroup for 31988" error="cgroups: cgroup deleted" runtime=io.containerd.runc.v2
time="2023-09-24T23:18:12.032953924+08:00" level=debug msg="event forwarded" ns=testing topic=/tasks/create type=containerd.events.TaskCreate
time="2023-09-24T23:18:12.033610816+08:00" level=debug msg="event forwarded" ns=testing topic=/tasks/exit type=containerd.events.TaskExit
time="2023-09-24T23:18:12.117792109+08:00" level=debug msg="failed to delete task" error="rpc error: code = Unknown desc = failed to delete task: runc-fp did not terminate successfully: exit status 1: " id=TestIssue9103
time="2023-09-24T23:18:12.119721816+08:00" level=debug msg="received signal" signal=terminated
time="2023-09-24T23:18:12.119757543+08:00" level=debug msg="sd notification" notified=false state="STOPPING=1"
time="2023-09-24T23:18:12.119818658+08:00" level=error msg="failed to close plugin" error="context canceled" id=io.containerd.internal.v1.tracing
time="2023-09-24T23:18:12.119838245+08:00" level=info msg="Stop CRI service"
2023/09/24 23:18:12 traces export: Post "https://localhost:4318/v1/traces": dial tcp 127.0.0.1:4318: connect: connection refused
2023/09/24 23:18:12 context canceled
time="2023-09-24T23:18:12.147360250+08:00" level=debug msg="cni watcher channel is closed"
time="2023-09-24T23:18:12.147424911+08:00" level=info msg="Stop CRI service"
time="2023-09-24T23:18:12.147450228+08:00" level=info msg="Event monitor stopped"
time="2023-09-24T23:18:12.147454096+08:00" level=info msg="Stream server stopped"

ok      github.com/containerd/containerd/integration/client     5.727s

cyyzero · 2023-09-24T16:23:29Z

@laurazard @fuweid Thanks to your suggestions, I've modified the fix, adding handleProcessExitNoLock() function which will be called in the create handler to handle child process exit.

fuweid · 2023-09-25T12:33:59Z

@cyyzero the change looks good to me. would you please add the extra commit #9121 into your pull request? So it's easy for reviewers to review the change. Thanks!

cyyzero · 2023-09-26T14:46:51Z

@fuweid I've add that commit here.

corhere

You've been led up the garden path, unfortunately. The locking of s.mu inside handleProcessExit looks to be nothing more than a vestige from when it was called checkProcesses and iterated over s.containers. As far as I can tell, @cyyzero is correct: s.mu is only used to synchronize accesses to the s.containers map. Therefore resolving the deadlock really is as simple as removing the locking of s.mu from handleProcessExit. (Also, the concurrency of Create() could be improved by only holding the s.mu lock while modifying s.containers.)

If I'm wrong and s.mu actually does need to be held while calling handleProcessExit, there is a much cleaner way to deal with it than adding a NoLock variant of the function with the same visibility: document a precondition.

// s.mu must be locked when calling this function.

corhere · 2023-09-26T19:38:10Z

runtime/v2/runc/task/service.go

 // It must be called before the caller of preStart returns, otherwise severe
 // memory leaks will occur.
-func (s *service) preStart(c *runc.Container) (handleStarted func(*runc.Container, process.Process), cleanup func()) {
+func (s *service) preStart(c *runc.Container, muLocked bool) (handleStarted func(*runc.Container, process.Process), cleanup func()) {


preStart does not need to be aware of whether s.mu is locked. Furthermore, the lock state of s.mu could change between the preStart call and its handleStarted callback.

Suggested change

func (s *service) preStart(c *runc.Container, muLocked bool) (handleStarted func(*runc.Container, process.Process), cleanup func()) {

func (s *service) preStart(c *runc.Container) (handleStarted func(*runc.Container, process.Process, muLocked bool), cleanup func()) {

Nice advice. I'll fix it this way, if we really need to keep s.mu locked.

integration/failpoint/cmd/runc-fp/issue9103.go

cyyzero · 2023-09-27T01:52:26Z

You've been led up the garden path, unfortunately. The locking of s.mu inside handleProcessExit looks to be nothing more than a vestige from when it was called checkProcesses and iterated over s.containers. As far as I can tell, @cyyzero is correct: s.mu is only used to synchronize accesses to the s.containers map. Therefore resolving the deadlock really is as simple as removing the locking of s.mu from handleProcessExit. (Also, the concurrency of Create() could be improved by only holding the s.mu lock while modifying s.containers.)

@corhere I agree with you. There is some confusion about the usage of s.mu, sometimes it is used to protect s.containers only, but in Create() and Shutdown() functions, it is used to mutually exclude the whole function. I wonder why this is so?

corhere · 2023-09-27T18:40:26Z

There is some confusion about the usage of s.mu, sometimes it is used to protect s.containers only, but in Create() and Shutdown() functions, it is used to mutually exclude the whole function. I wonder why this is so?

I've traced the mutual exclusion in Create() to #1452. It was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. #8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling.

As for Shutdown(), the implementation was added in #3004 with no explanation, or precedent I could find. I can confidently say that in this initial implementation, the mutex is locked to safely access s.containers, but I can only speculate that it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by #4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked.

Shutdown() does fairly little work while holding the mutex. The only registered shutdown callbacks I could find in the whole containerd codebase are

containerd/runtime/v2/runc/task/service.go

Lines 91 to 100 in f110331

    
           sd.RegisterCallback(func(context.Context) error { 
        
           	close(s.events) 
        
           	return nil 
        
           }) 
        
           if address, err := shim.ReadAddress("address"); err == nil { 
        
           	sd.RegisterCallback(func(context.Context) error { 
        
           		return shim.RemoveSocket(address) 
        
           	}) 
        
           }

containerd/runtime/v2/runc/task/service.go

Line 720 in f110331

    
           s.shutdown.RegisterCallback(func(context.Context) error { return s.platform.Close() })

That is, close a channel, delete a file from the filesystem, and close a file descriptor, respectively. None of these take a particularly long time, and there isn't really anything that could race with those operations. I don't see any way for new problems to arise if Create() was no longer mutually exclusive with Shutdown(). The worst that could happen would be slightly widening the window for already-existing race conditions.

Now I am fully convinced that the only sound use for s.mu is to synchronize access to the s.containers map. All other uses are either vestigial or unsound.

handleProcessExit() does not need mutual exclusion with Create() as early exits are no longer handled using a mutex.
handleProcessExit() does not need mutual exclusion with Shutdown() as Shutdown() is a no-op while len(s.containers) > 0. Furthermore, after a call to Shutdown() (which isn't a no-op), any call handleProcessExit() would panic trying to send to the channel that Shutdown() had just closed. Therefore removing mutual exclusion would not make things any buggier than the status quo. QED.
Aside from synchronizing accesses to s.containers, mutual exclusion between Create() and Shutdown() is not fit for purpose. It's an identical story to handleProcessExit(): after Shutdown(), any Create() calls would panic trying to send to a closed channel.

cyyzero · 2023-10-01T20:41:42Z

@corhere Thank you for your detailed and thorough analysis. The s.mu should only be used to protect s.containers, so I updated the fix again.
cc @fuweid @laurazard @thaJeztah

fuweid · 2023-10-02T01:35:07Z

I would like to say that I like the first version because it foucus on the bug-fix.
I don't want to get the refactor thing in the fix because the patch will backport to the v1.6/v.17.

prefer to apply the refactor in the follow-up.

fuweid · 2023-10-02T15:13:52Z

The CI issue will be fixed by #9178

Previous code has already called `getContainer()`, just pass it into `s.getContainerPids` to reduce unnecessary lock and map lookup. Signed-off-by: Chen Yiyang <cyyzero@qq.com>

After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. So, I added a parameter muLocked to handleStarted to indicate whether or not s.mu is currently locked, and thus deciding whether or not to lock it when calling handleProcessExit. Fix: containerd#9103 Signed-off-by: Chen Yiyang <cyyzero@qq.com>

Signed-off-by: Wei Fu <fuweid89@gmail.com>

cyyzero · 2023-10-02T17:52:25Z

@fuweid Thanks for the suggestion, this pr is for bug fixe only. Maybe a refactoring job will be done for subsequent changes. I've rolled back the changes.
cc @corhere @laurazard @thaJeztah

fuweid

LGTM

cyyzero · 2023-10-07T06:38:36Z

Any thoughts? @corhere :)

fuweid · 2023-10-08T07:48:10Z

ping @containerd/reviewers

thaJeztah · 2023-10-10T08:14:54Z

Opened backports for 1.6 and 1.7;

[release/1.6 backport] containerd-shim-runc-v2: avoid potential deadlock in create handler #9210
[release/1.7 backport] containerd-shim-runc-v2: avoid potential deadlock in create handler #9209

…h upstream containerd/main update fork-external/main branch to upstream containerd/main at commit f90f80d Related work items: containerd#8736, containerd#8861, containerd#8924, containerd#8934, containerd#9027, containerd#9076, containerd#9104, containerd#9118, containerd#9141, containerd#9155, containerd#9177, containerd#9183, containerd#9184, containerd#9186, containerd#9187, containerd#9191, containerd#9200, containerd#9205, containerd#9211, containerd#9214, containerd#9215, containerd#9221, containerd#9223, containerd#9228, containerd#9231, containerd#9234, containerd#9242, containerd#9246, containerd#9247, containerd#9251, containerd#9253, containerd#9254, containerd#9255, containerd#9268

k8s-ci-robot added the needs-ok-to-test label Sep 16, 2023

cyyzero force-pushed the fix-deadlock branch 2 times, most recently from acd1e73 to cfae7c2 Compare September 17, 2023 04:41

cyyzero force-pushed the fix-deadlock branch 4 times, most recently from 6328de8 to bb0b529 Compare September 17, 2023 11:07

fuweid mentioned this pull request Sep 20, 2023

*: add runc-fp as runc wrapper to inject failpoint #9121

Closed

fuweid reviewed Sep 21, 2023

View reviewed changes

runtime/v2/runc/task/service.go Show resolved Hide resolved

fuweid added cherry-pick/1.6.x cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Sep 21, 2023

cyyzero force-pushed the fix-deadlock branch 2 times, most recently from e88a0cd to 9b099f2 Compare September 24, 2023 16:06

cyyzero force-pushed the fix-deadlock branch from 9b099f2 to 97b30ef Compare September 25, 2023 01:44

corhere requested changes Sep 26, 2023

View reviewed changes

fuweid reviewed Sep 27, 2023

View reviewed changes

integration/failpoint/cmd/runc-fp/issue9103.go Outdated Show resolved Hide resolved

cyyzero force-pushed the fix-deadlock branch from 6ad7f98 to d43159d Compare September 27, 2023 15:18

cyyzero requested review from corhere and fuweid September 27, 2023 15:21

cyyzero force-pushed the fix-deadlock branch from f2e3a0d to 8164791 Compare October 1, 2023 20:31

corhere approved these changes Oct 2, 2023

View reviewed changes

cyyzero and others added 3 commits October 2, 2023 17:44

containerd-shim-runc-v2: remove unnecessary s.getContainer()

6604ff6

Previous code has already called `getContainer()`, just pass it into `s.getContainerPids` to reduce unnecessary lock and map lookup. Signed-off-by: Chen Yiyang <cyyzero@qq.com>

*: add runc-fp as runc wrapper to inject failpoint

11a7751

Signed-off-by: Wei Fu <fuweid89@gmail.com>

cyyzero force-pushed the fix-deadlock branch from 8164791 to 11a7751 Compare October 2, 2023 17:45

fuweid approved these changes Oct 3, 2023

View reviewed changes

cyyzero requested a review from corhere October 3, 2023 04:14

cyyzero changed the title ~~containerd-shim-runc-v2: avoid potential deadlock in create handler and remove unnecessary s.getContainer()~~ containerd-shim-runc-v2: avoid potential deadlock in create handler Oct 7, 2023

dmcgowan approved these changes Oct 9, 2023

View reviewed changes

mxpv approved these changes Oct 9, 2023

View reviewed changes

fuweid merged commit 8db0d39 into containerd:main Oct 9, 2023

This was referenced Oct 10, 2023

[release/1.7 backport] containerd-shim-runc-v2: avoid potential deadlock in create handler #9209

Merged

[release/1.6 backport] containerd-shim-runc-v2: avoid potential deadlock in create handler #9210

Merged

thaJeztah added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch and removed cherry-pick/1.6.x cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Oct 10, 2023

Syx5290 mentioned this pull request Oct 12, 2023

containerd-shim-runc-v2 Memory leaks up to more than 20 G #9217

Closed

corhere mentioned this pull request Feb 13, 2024

runc-shim-v2: emit init exit after exec exits #9775

Closed

corhere mentioned this pull request Aug 27, 2024

runc-shim: publish init exit after all processes have exited #10647

Closed

	func (s service) preStart(c runc.Container, muLocked bool) (handleStarted func(*runc.Container, process.Process), cleanup func()) {
	func (s service) preStart(c runc.Container) (handleStarted func(*runc.Container, process.Process, muLocked bool), cleanup func()) {

containerd-shim-runc-v2: avoid potential deadlock in create handler #9104

containerd-shim-runc-v2: avoid potential deadlock in create handler #9104

Uh oh!

Conversation

cyyzero commented Sep 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Sep 16, 2023

Uh oh!

cyyzero commented Sep 17, 2023

Uh oh!

laurazard commented Sep 17, 2023

Uh oh!

cyyzero commented Sep 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuweid commented Sep 20, 2023

Uh oh!

Uh oh!

cyyzero commented Sep 24, 2023

Uh oh!

cyyzero commented Sep 24, 2023

Uh oh!

fuweid commented Sep 25, 2023

Uh oh!

cyyzero commented Sep 26, 2023

Uh oh!

corhere left a comment

Choose a reason for hiding this comment

Uh oh!

corhere Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

cyyzero Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cyyzero commented Sep 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

corhere commented Sep 27, 2023

Uh oh!

cyyzero commented Oct 1, 2023

Uh oh!

fuweid commented Oct 2, 2023

Uh oh!

fuweid commented Oct 2, 2023

Uh oh!

cyyzero commented Oct 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuweid left a comment

Choose a reason for hiding this comment

Uh oh!

cyyzero commented Oct 7, 2023

Uh oh!

fuweid commented Oct 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thaJeztah commented Oct 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

cyyzero commented Sep 16, 2023 •

edited

Loading

cyyzero commented Sep 17, 2023 •

edited

Loading

cyyzero commented Sep 27, 2023 •

edited

Loading

cyyzero commented Oct 2, 2023 •

edited

Loading

fuweid commented Oct 8, 2023 •

edited

Loading