Skip to content

Conversation

@cyyzero
Copy link
Contributor

@cyyzero cyyzero commented Sep 16, 2023

After pr #8617, create handler of containerd-shim-runc-v2 will
call handleStarted() to record the init process and handle its exit.
Although init process wouldn't quit so early in normal circumstances.
But if this screnario occurs, handleStarted() will call
handleProcessExit(), which will cause deadlock because create() had
acquired s.mu, and handleProcessExit() will try to lock it again.

So I add handleProcessExitNoLock() function which will not lock s.mu.
It can safely be called in create handler without deadlock.

Fix: #9103

@k8s-ci-robot
Copy link

Hi @cyyzero. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cyyzero cyyzero force-pushed the fix-deadlock branch 2 times, most recently from acd1e73 to cfae7c2 Compare September 17, 2023 04:41
@cyyzero
Copy link
Contributor Author

cyyzero commented Sep 17, 2023

@cyyzero cyyzero force-pushed the fix-deadlock branch 4 times, most recently from 6328de8 to bb0b529 Compare September 17, 2023 11:07
@laurazard
Copy link
Member

Firstly, pre-apologies for maybe misunderstanding the issue since I'm on leave and might be taking too light a look at this, but it strikes me that even though

handleProcessExit() won't access s.containers

doesn't necessarily mean that it's safe to remove that lock. Somewhere else that locks on s.mu is

func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (*ptypes.Empty, error) {
s.mu.Lock()
defer s.mu.Unlock()
// return out if the shim is still servicing containers
if len(s.containers) > 0 {
return empty, nil
}
// please make sure that temporary resource has been cleanup or registered
// for cleanup before calling shutdown
s.shutdown.Shutdown()
return empty, nil
}

which I'm not sure we want happening concurrently with handleProcessExit since we want to ensure all children are killed and events are sent before then. (There might be other places, I haven't checked).

@corhere you wanna TAL at this when you get a chance?

@cyyzero
Copy link
Contributor Author

cyyzero commented Sep 17, 2023

@laurazard Thanks for your comment!

which I'm not sure we want happening concurrently with handleProcessExit since we want to ensure all children are killed and events are sent before then. (There might be other places, I haven't checked).

In my opinion, mutex cannot ensure the sequence that all children are killed and events are sent before shutdown. Other synchronization are needed to ensure this, such as waiting for channel or conditional variables.

If handleProcessExit() must acquire s.mu, we can provided two functions like handleProcessExit() and handleProcessExitLockless(), which will be used in create handler and start handler.

@fuweid
Copy link
Member

fuweid commented Sep 20, 2023

@cyyzero would you mind to use https://github.com/containerd/containerd/pull/9121/files to verify your change? thanks

$ # run the following command in containerd repo
$ DISABLE_CRI_SANDBOXES=1 EXTRA_TESTFLAGS="-test.count=1 -test.run=TestIssue9103" BUILDTAGS=no_btrfs CONTAINERD_RUNTIME=io.containerd.runc.v2 make integration

cc @laurazard

@fuweid fuweid added cherry-pick/1.6.x cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Sep 21, 2023
@cyyzero cyyzero force-pushed the fix-deadlock branch 2 times, most recently from e88a0cd to 9b099f2 Compare September 24, 2023 16:06
@cyyzero
Copy link
Contributor Author

cyyzero commented Sep 24, 2023

@fuweid Thanks for the test. I have verified and the output is as follows:

+ integration
INFO[0000] Using the following image list: {Alpine:ghcr.io/containerd/alpine:3.14.0 BusyBox:ghcr.io/containerd/busybox:1.36 Pause:registry.k8s.io/pause:3.9 ResourceConsumer:registry.k8s.io/e2e-test-images/resource-consumer:1.10 VolumeCopyUp:ghcr.io/containerd/volume-copy-up:2.2 VolumeOwnership:ghcr.io/containerd/volume-ownership:2.1 ArgsEscaped:cplatpublic.azurecr.io/args-escaped-test-image-ns:1.0} 
INFO[0000] running tests against containerd              revision=9b099f2dca75a2c225167323ea587f7fbf64c159 runtime= snapshotter= version=9b099f2dc
INFO[0000] start to pull seed image                      image="ghcr.io/containerd/busybox:1.36"
=== RUN   TestIssue9103
    log_hook.go:47: time="2023-09-24T23:18:08.876206853+08:00" level=debug msg="remote introspection plugin filters" func="introspection.(*introspectionRemote).Plugins" file="/home/cyyzero/workspace/software/containerd-test/services/introspection/introspection.go:46" filters="[type==io.containerd.snapshotter.v1, id==overlayfs]" testcase=TestIssue9103
--- PASS: TestIssue9103 (3.26s)
PASS
containerd github.com/containerd/containerd 9b099f2dc 9b099f2dca75a2c225167323ea587f7fbf64c159
time="2023-09-24T23:18:06.483387185+08:00" level=info msg="starting containerd" revision=9b099f2dca75a2c225167323ea587f7fbf64c159 version=9b099f2dc
time="2023-09-24T23:18:06.485907253+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.486107268+08:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs (ext4) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.486142244+08:00" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
time="2023-09-24T23:18:06.486159606+08:00" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
time="2023-09-24T23:18:06.486198730+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.blockfile\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.486234767+08:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.blockfile\"..." error="no scratch file generator: skip plugin" type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.486261698+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.486613998+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.488034631+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.488076370+08:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." error="devmapper not configured: skip plugin" type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.488083373+08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.488101887+08:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="failed to resolve symlink for \"/var/lib/containerd/io.containerd.snapshotter.v1.zfs\": lstat /var/lib/containerd/io.containerd.snapshotter.v1.zfs: no such file or directory: skip plugin" type=io.containerd.snapshotter.v1
time="2023-09-24T23:18:06.488115182+08:00" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2023-09-24T23:18:06.488135100+08:00" level=info msg="metadata content store policy set" policy=shared
time="2023-09-24T23:18:06.488376432+08:00" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
time="2023-09-24T23:18:06.488409053+08:00" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
time="2023-09-24T23:18:06.488431325+08:00" level=info msg="loading plugin \"io.containerd.lease.v1.manager\"..." type=io.containerd.lease.v1
time="2023-09-24T23:18:06.488458306+08:00" level=info msg="loading plugin \"io.containerd.nri.v1.nri\"..." type=io.containerd.nri.v1
time="2023-09-24T23:18:06.488472713+08:00" level=info msg="NRI interface is disabled by configuration."
time="2023-09-24T23:18:06.488477962+08:00" level=info msg="loading plugin \"io.containerd.image-verifier.v1.bindir\"..." type=io.containerd.image-verifier.v1
time="2023-09-24T23:18:06.488489534+08:00" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
time="2023-09-24T23:18:06.488541472+08:00" level=info msg="loading plugin \"io.containerd.runtime.v2.shim\"..." type=io.containerd.runtime.v2
time="2023-09-24T23:18:06.488570706+08:00" level=info msg="loading plugin \"io.containerd.sandbox.store.v1.local\"..." type=io.containerd.sandbox.store.v1
time="2023-09-24T23:18:06.488619498+08:00" level=info msg="loading plugin \"io.containerd.sandbox.controller.v1.local\"..." type=io.containerd.sandbox.controller.v1
time="2023-09-24T23:18:06.488656968+08:00" level=info msg="loading plugin \"io.containerd.streaming.v1.manager\"..." type=io.containerd.streaming.v1
time="2023-09-24T23:18:06.488680222+08:00" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488717812+08:00" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488728672+08:00" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488734433+08:00" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488740465+08:00" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488745694+08:00" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488750664+08:00" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.488763598+08:00" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
time="2023-09-24T23:18:06.489340034+08:00" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2023-09-24T23:18:06.489387072+08:00" level=debug msg="No blockio config file specified, blockio not configured"
time="2023-09-24T23:18:06.489395067+08:00" level=debug msg="No RDT config file specified, RDT not configured"
time="2023-09-24T23:18:06.489404034+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489415555+08:00" level=info msg="loading plugin \"io.containerd.transfer.v1.local\"..." type=io.containerd.transfer.v1
time="2023-09-24T23:18:06.489464206+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2023-09-24T23:18:06.489554175+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489575715+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489600091+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489623675+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489630258+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489639465+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489665033+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489679821+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489723212+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
time="2023-09-24T23:18:06.489850821+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.sandbox-controllers\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489891648+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.sandboxes\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489910293+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489926122+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.streaming\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489961879+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.489997586+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.transfer\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.490009328+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.490021170+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.cri\"..." type=io.containerd.grpc.v1
time="2023-09-24T23:18:06.490117651+08:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc Runtimes:map[runc:{Type:io.containerd.runc.v2 Path: PodAnnotations:[] ContainerAnnotations:[] Options:map[BinaryName: CriuImagePath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false Root: ShimCgroup:] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:podsandbox}] DisableSnapshotAnnotations:true DiscardUnpackedLayers:false IgnoreBlockIONotEnabledErrors:false IgnoreRdtNotEnabledErrors:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginSetupSerially:false NetworkPluginConfTemplate: IPPreference:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:registry.k8s.io/pause:3.9 StatsCollectPeriod:10 EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true DeviceOwnershipFromSecurityContext:false IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false EnableUnprivilegedPorts:false EnableUnprivilegedICMP:false EnableCDI:false CDISpecDirs:[/etc/cdi /var/run/cdi] ImagePullProgressTimeout:1m0s DrainExecSyncIOTimeout:0s} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd-test/io.containerd.grpc.v1.cri}"
time="2023-09-24T23:18:06.490177083+08:00" level=info msg="Connect containerd service"
time="2023-09-24T23:18:06.490211167+08:00" level=info msg="using legacy CRI server"
time="2023-09-24T23:18:06.490233669+08:00" level=info msg="using experimental NRI integration - disable nri plugin to prevent this"
time="2023-09-24T23:18:06.490351830+08:00" level=info msg="Get image filesystem path \"/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs\""
time="2023-09-24T23:18:06.491110930+08:00" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
time="2023-09-24T23:18:06.491173958+08:00" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
time="2023-09-24T23:18:06.491258035+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
time="2023-09-24T23:18:06.491276404+08:00" level=info msg="Start subscribing containerd event"
time="2023-09-24T23:18:06.491576476+08:00" level=info msg="Start recovering state"
time="2023-09-24T23:18:06.491630714+08:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
time="2023-09-24T23:18:06.491653511+08:00" level=info msg="Start event monitor"
time="2023-09-24T23:18:06.491685867+08:00" level=info msg=serving... address=/run/containerd/containerd.sock
time="2023-09-24T23:18:06.491687715+08:00" level=info msg="Start snapshots syncer"
time="2023-09-24T23:18:06.491730601+08:00" level=info msg="Start cni network conf syncer for default"
time="2023-09-24T23:18:06.491734879+08:00" level=info msg="Start streaming server"
time="2023-09-24T23:18:06.491756590+08:00" level=debug msg="sd notification" notified=false state="READY=1"
time="2023-09-24T23:18:06.491779854+08:00" level=info msg="containerd successfully booted in 0.008757s"
time="2023-09-24T23:18:06.593333740+08:00" level=debug msg="garbage collected" d=4.593622ms
time="2023-09-24T23:18:06.980385290+08:00" level=debug msg="event published" ns=testing topic=/namespaces/update type=containerd.events.NamespaceUpdate
time="2023-09-24T23:18:08.803267678+08:00" level=debug msg="(*service).Write started" expected="sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c" ref="index-sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c" total=2295
time="2023-09-24T23:18:08.821995286+08:00" level=debug msg="(*service).Write started" expected="sha256:907ca53d7e2947e849b839b1cd258c98fd3916c60f2e6e70c30edbf741ab6754" ref="manifest-sha256:907ca53d7e2947e849b839b1cd258c98fd3916c60f2e6e70c30edbf741ab6754" total=528
time="2023-09-24T23:18:08.837254340+08:00" level=debug msg="(*service).Write started" expected="sha256:66ba00ad3de8677a3fa4bc4ea0fc46ebca0f14db46ca365e7f60833068dd0148" ref="config-sha256:66ba00ad3de8677a3fa4bc4ea0fc46ebca0f14db46ca365e7f60833068dd0148" total=1457
time="2023-09-24T23:18:08.848513412+08:00" level=debug msg="stat snapshot" key="sha256:b64792c17e4ad443d16b218afb3a8f5d03ca0f4ec49b11c1a7aebe17f6c3c1d2"
time="2023-09-24T23:18:08.854213584+08:00" level=debug msg="create image" name="ghcr.io/containerd/busybox:1.36" target="sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c"
time="2023-09-24T23:18:08.858224642+08:00" level=debug msg="event published" ns=testing topic=/images/update type=containerd.services.images.v1.ImageUpdate
time="2023-09-24T23:18:08.858361870+08:00" level=debug msg="Received containerd event timestamp - 2023-09-24 15:18:08.858219653 +0000 UTC, namespace - \"testing\", topic - \"/images/update\""
time="2023-09-24T23:18:08.858376447+08:00" level=debug msg="Ignoring events in namespace - \"testing\""
time="2023-09-24T23:18:08.877886822+08:00" level=debug msg="prepare snapshot" key=TestIssue9103 parent="sha256:b64792c17e4ad443d16b218afb3a8f5d03ca0f4ec49b11c1a7aebe17f6c3c1d2"
time="2023-09-24T23:18:08.888991414+08:00" level=debug msg="event published" ns=testing topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
time="2023-09-24T23:18:08.893618664+08:00" level=debug msg="get snapshot mounts" key=TestIssue9103
time="2023-09-24T23:18:08.902182818+08:00" level=debug msg="event published" ns=testing topic=/containers/create type=containerd.events.ContainerCreate
time="2023-09-24T23:18:08.907228321+08:00" level=debug msg="get snapshot mounts" key=TestIssue9103
time="2023-09-24T23:18:08.914998242+08:00" level=debug msg="shim bootstrap parameters" address="unix:///run/containerd/s/b60b5f01d31b4d44c5871f43149981a7412434bdbc265d76acfbc41f8b739fcb" namespace=testing protocol=ttrpc
time="2023-09-24T23:18:08.917076983+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2023-09-24T23:18:08.917405806+08:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.pause\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2023-09-24T23:18:08.917417958+08:00" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.pause runtime=io.containerd.runc.v2
time="2023-09-24T23:18:08.917440180+08:00" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2023-09-24T23:18:08.917463895+08:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2023-09-24T23:18:08.917528135+08:00" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.task runtime=io.containerd.runc.v2
time="2023-09-24T23:18:08.917621971+08:00" level=debug msg="serving api on socket" socket="[inherited from parent]"
time="2023-09-24T23:18:08.917651647+08:00" level=debug msg="starting signal loop" namespace=testing path=/run/containerd-test/io.containerd.runtime.v2.task/testing/TestIssue9103 pid=31963 runtime=io.containerd.runc.v2
time="2023-09-24T23:18:09.077660531+08:00" level=debug msg="garbage collected" d=2.589245ms
2023/09/24 23:18:11 traces export: Post "https://localhost:4318/v1/traces": dial tcp 127.0.0.1:4318: connect: connection refused
time="2023-09-24T23:18:12.031835788+08:00" level=error msg="loading cgroup for 31988" error="cgroups: cgroup deleted" runtime=io.containerd.runc.v2
time="2023-09-24T23:18:12.032953924+08:00" level=debug msg="event forwarded" ns=testing topic=/tasks/create type=containerd.events.TaskCreate
time="2023-09-24T23:18:12.033610816+08:00" level=debug msg="event forwarded" ns=testing topic=/tasks/exit type=containerd.events.TaskExit
time="2023-09-24T23:18:12.117792109+08:00" level=debug msg="failed to delete task" error="rpc error: code = Unknown desc = failed to delete task: runc-fp did not terminate successfully: exit status 1: " id=TestIssue9103
time="2023-09-24T23:18:12.119721816+08:00" level=debug msg="received signal" signal=terminated
time="2023-09-24T23:18:12.119757543+08:00" level=debug msg="sd notification" notified=false state="STOPPING=1"
time="2023-09-24T23:18:12.119818658+08:00" level=error msg="failed to close plugin" error="context canceled" id=io.containerd.internal.v1.tracing
time="2023-09-24T23:18:12.119838245+08:00" level=info msg="Stop CRI service"
2023/09/24 23:18:12 traces export: Post "https://localhost:4318/v1/traces": dial tcp 127.0.0.1:4318: connect: connection refused
2023/09/24 23:18:12 context canceled
time="2023-09-24T23:18:12.147360250+08:00" level=debug msg="cni watcher channel is closed"
time="2023-09-24T23:18:12.147424911+08:00" level=info msg="Stop CRI service"
time="2023-09-24T23:18:12.147450228+08:00" level=info msg="Event monitor stopped"
time="2023-09-24T23:18:12.147454096+08:00" level=info msg="Stream server stopped"

ok      github.com/containerd/containerd/integration/client     5.727s

@cyyzero
Copy link
Contributor Author

cyyzero commented Sep 24, 2023

@laurazard @fuweid Thanks to your suggestions, I've modified the fix, adding handleProcessExitNoLock() function which will be called in the create handler to handle child process exit.

@fuweid
Copy link
Member

fuweid commented Sep 25, 2023

@cyyzero the change looks good to me. would you please add the extra commit #9121 into your pull request? So it's easy for reviewers to review the change. Thanks!

@cyyzero
Copy link
Contributor Author

cyyzero commented Sep 26, 2023

@fuweid I've add that commit here.

Copy link
Member

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've been led up the garden path, unfortunately. The locking of s.mu inside handleProcessExit looks to be nothing more than a vestige from when it was called checkProcesses and iterated over s.containers. As far as I can tell, @cyyzero is correct: s.mu is only used to synchronize accesses to the s.containers map. Therefore resolving the deadlock really is as simple as removing the locking of s.mu from handleProcessExit. (Also, the concurrency of Create() could be improved by only holding the s.mu lock while modifying s.containers.)

If I'm wrong and s.mu actually does need to be held while calling handleProcessExit, there is a much cleaner way to deal with it than adding a NoLock variant of the function with the same visibility: document a precondition.

// s.mu must be locked when calling this function.

// It must be called before the caller of preStart returns, otherwise severe
// memory leaks will occur.
func (s *service) preStart(c *runc.Container) (handleStarted func(*runc.Container, process.Process), cleanup func()) {
func (s *service) preStart(c *runc.Container, muLocked bool) (handleStarted func(*runc.Container, process.Process), cleanup func()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preStart does not need to be aware of whether s.mu is locked. Furthermore, the lock state of s.mu could change between the preStart call and its handleStarted callback.

Suggested change
func (s *service) preStart(c *runc.Container, muLocked bool) (handleStarted func(*runc.Container, process.Process), cleanup func()) {
func (s *service) preStart(c *runc.Container) (handleStarted func(*runc.Container, process.Process, muLocked bool), cleanup func()) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice advice. I'll fix it this way, if we really need to keep s.mu locked.

@cyyzero
Copy link
Contributor Author

cyyzero commented Sep 27, 2023

You've been led up the garden path, unfortunately. The locking of s.mu inside handleProcessExit looks to be nothing more than a vestige from when it was called checkProcesses and iterated over s.containers. As far as I can tell, @cyyzero is correct: s.mu is only used to synchronize accesses to the s.containers map. Therefore resolving the deadlock really is as simple as removing the locking of s.mu from handleProcessExit. (Also, the concurrency of Create() could be improved by only holding the s.mu lock while modifying s.containers.)

@corhere I agree with you. There is some confusion about the usage of s.mu, sometimes it is used to protect s.containers only, but in Create() and Shutdown() functions, it is used to mutually exclude the whole function. I wonder why this is so?

@corhere
Copy link
Member

corhere commented Sep 27, 2023

There is some confusion about the usage of s.mu, sometimes it is used to protect s.containers only, but in Create() and Shutdown() functions, it is used to mutually exclude the whole function. I wonder why this is so?

I've traced the mutual exclusion in Create() to #1452. It was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. #8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling.

As for Shutdown(), the implementation was added in #3004 with no explanation, or precedent I could find. I can confidently say that in this initial implementation, the mutex is locked to safely access s.containers, but I can only speculate that it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by #4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked.

Shutdown() does fairly little work while holding the mutex. The only registered shutdown callbacks I could find in the whole containerd codebase are

sd.RegisterCallback(func(context.Context) error {
close(s.events)
return nil
})
if address, err := shim.ReadAddress("address"); err == nil {
sd.RegisterCallback(func(context.Context) error {
return shim.RemoveSocket(address)
})
}

s.shutdown.RegisterCallback(func(context.Context) error { return s.platform.Close() })

That is, close a channel, delete a file from the filesystem, and close a file descriptor, respectively. None of these take a particularly long time, and there isn't really anything that could race with those operations. I don't see any way for new problems to arise if Create() was no longer mutually exclusive with Shutdown(). The worst that could happen would be slightly widening the window for already-existing race conditions.

Now I am fully convinced that the only sound use for s.mu is to synchronize access to the s.containers map. All other uses are either vestigial or unsound.

  • handleProcessExit() does not need mutual exclusion with Create() as early exits are no longer handled using a mutex.
  • handleProcessExit() does not need mutual exclusion with Shutdown() as Shutdown() is a no-op while len(s.containers) > 0. Furthermore, after a call to Shutdown() (which isn't a no-op), any call handleProcessExit() would panic trying to send to the channel that Shutdown() had just closed. Therefore removing mutual exclusion would not make things any buggier than the status quo. QED.
  • Aside from synchronizing accesses to s.containers, mutual exclusion between Create() and Shutdown() is not fit for purpose. It's an identical story to handleProcessExit(): after Shutdown(), any Create() calls would panic trying to send to a closed channel.

@cyyzero
Copy link
Contributor Author

cyyzero commented Oct 1, 2023

@corhere Thank you for your detailed and thorough analysis. The s.mu should only be used to protect s.containers, so I updated the fix again.
cc @fuweid @laurazard @thaJeztah

@fuweid
Copy link
Member

fuweid commented Oct 2, 2023

I would like to say that I like the first version because it foucus on the bug-fix.
I don't want to get the refactor thing in the fix because the patch will backport to the v1.6/v.17.

@fuweid fuweid dismissed their stale review October 2, 2023 15:13

prefer to apply the refactor in the follow-up.

@fuweid
Copy link
Member

fuweid commented Oct 2, 2023

The CI issue will be fixed by #9178

cyyzero and others added 3 commits October 2, 2023 17:44
Previous code has already called `getContainer()`, just pass it into
`s.getContainerPids` to reduce unnecessary lock and map lookup.

Signed-off-by: Chen Yiyang <cyyzero@qq.com>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will
call handleStarted() to record the init process and handle its exit.
Init process wouldn't quit so early in normal circumstances. But if
this screnario occurs, handleStarted() will call
handleProcessExit(), which will cause deadlock because create() had
acquired s.mu, and handleProcessExit() will try to lock it again.

So, I added a parameter muLocked to handleStarted to indicate whether
or not s.mu is currently locked, and thus deciding whether or not to
lock it when calling handleProcessExit.

Fix: containerd#9103
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
@cyyzero
Copy link
Contributor Author

cyyzero commented Oct 2, 2023

@fuweid Thanks for the suggestion, this pr is for bug fixe only. Maybe a refactoring job will be done for subsequent changes. I've rolled back the changes.
cc @corhere @laurazard @thaJeztah

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyyzero cyyzero requested a review from corhere October 3, 2023 04:14
@cyyzero cyyzero changed the title containerd-shim-runc-v2: avoid potential deadlock in create handler and remove unnecessary s.getContainer() containerd-shim-runc-v2: avoid potential deadlock in create handler Oct 7, 2023
@cyyzero
Copy link
Contributor Author

cyyzero commented Oct 7, 2023

Any thoughts? @corhere :)

@fuweid
Copy link
Member

fuweid commented Oct 8, 2023

ping @containerd/reviewers

@fuweid fuweid merged commit 8db0d39 into containerd:main Oct 9, 2023
@thaJeztah thaJeztah added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch and removed cherry-pick/1.6.x cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch needs-ok-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

containerd-shim may deadlock when processing create requests

8 participants