[release/2.0] Disable event subscriber during task cleanup #12406

fuweid · 2025-10-24T20:20:41Z

We have individual goroutine for each sandbox container. If there is any error in handler, that goroutine will put event in that backoff queue. So we don't need event subscriber for podsandbox. Otherwise, there will be two goroutines to cleanup sandbox container.

>>>> From EventMonitor
  time="2025-10-23T19:30:59.626254404Z" level=debug msg="Received containerd event timestamp - 2025-10-23 19:30:59.624494674 +0000 UTC, namespace - \"k8s.io\", topic - \"/tasks/exit\""
  time="2025-10-23T19:30:59.626301912Z" level=debug msg="TaskExit event in podsandbox handler container_id:\"22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf\" id:\"22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf\" pid:203121 exit_status:137 exited_at:{seconds:1761247859 nanos:624467824}"

>>> If EventMonitor handles task exit well, it will close ttrpc
connection and then waitSandboxExit could encounter ttrpc-closed error

  time="2025-10-23T19:30:59.688031150Z" level=error msg="failed to delete task" error="ttrpc: closed" id=22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf

If both task.Delete calls fail but the shim has already been shut down, it could trigger a new task.Exit event sent by cleanupAfterDeadShim. This would result in three events in the EventMonitor's backoff queue, which is unnecessary and could cause confusion due to duplicate events.

The worst-case scenario caused by two concurrent task.Delete calls is a shim leak. The timeline for this scenario is as follows:

Timestamp	Component	Action	Result
T1	EventMonitor	Sends `task.Delete`	Marked as Req-1
T2	waitSandboxExit	Sends `task.Delete`	Marked as Req-2
T3	containerd-shim	Handles Req-2	Container transitions from stopped to deleted
T4	containerd-shim	Handles Req-1	Fails - container already deleted Returns error: `cannot delete a deleted process: not found`
T5	EventMonitor	Receives `not found` error	-
T6	EventMonitor	Sends `shim.Shutdown` request	No-op (active container record still exists)
T7	EventMonitor	Closes ttrpc connection	Clean container state dir
T8	containerd-shim	Handles Req-2	Removes container record from memory
T9	waitSandboxExit	Receives error	Error: `ttrpc: closed`
T10	waitSandboxExit	Sends `shim.Shutdown` request	Fails (connection already closed)
T11	waitSandboxExit	Closes ttrpc connection	No-op (already closed)

The containerd-shim is still running because shim.Shutdown was sent at T6 before T8. Because container's state dir is deleted at T7, it's unable to clean it up after containerd restarted.

We should avoid concurrent task.Delete calls here.

I also add subcommand - shutdown - in ctr shim for debug.

Fixed: #12344
Cherry-picked: #12400

(cherry picked from commit 2042e80)

henry118 · 2025-10-27T21:49:29Z

#12420 should fix the fuzzing issue

We have individual goroutine for each sandbox container. If there is any error in handler, that goroutine will put event in that backoff queue. So we don't need event subscriber for podsandbox. Otherwise, there will be two goroutines to cleanup sandbox container. ``` >>>> From EventMonitor time="2025-10-23T19:30:59.626254404Z" level=debug msg="Received containerd event timestamp - 2025-10-23 19:30:59.624494674 +0000 UTC, namespace - \"k8s.io\", topic - \"/tasks/exit\"" time="2025-10-23T19:30:59.626301912Z" level=debug msg="TaskExit event in podsandbox handler container_id:\"22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf\" id:\"22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf\" pid:203121 exit_status:137 exited_at:{seconds:1761247859 nanos:624467824}" >>> If EventMonitor handles task exit well, it will close ttrpc connection and then waitSandboxExit could encounter ttrpc-closed error time="2025-10-23T19:30:59.688031150Z" level=error msg="failed to delete task" error="ttrpc: closed" id=22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf ``` If both task.Delete calls fail but the shim has already been shut down, it could trigger a new task.Exit event sent by cleanupAfterDeadShim. This would result in three events in the EventMonitor's backoff queue, which is unnecessary and could cause confusion due to duplicate events. The worst-case scenario caused by two concurrent task.Delete calls is a shim leak. The timeline for this scenario is as follows: | Timestamp | Component | Action | Result | | ------ | ----------- | -------- | -------- | | T1 | EventMonitor | Sends `task.Delete` | Marked as Req-1 | | T2 | waitSandboxExit | Sends `task.Delete` | Marked as Req-2 | | T3 | containerd-shim | Handles Req-2 | Container transitions from stopped to deleted | | T4 | containerd-shim | Handles Req-1 | Fails - container already deleted<br>Returns error: `cannot delete a deleted process: not found` | | T5 | EventMonitor | Receives `not found` error | - | | T6 | EventMonitor | Sends `shim.Shutdown` request | No-op (active container record still exists) | | T7 | EventMonitor | Closes ttrpc connection | Clean container state dir | | T8 | containerd-shim | Handles Req-2 | Removes container record from memory | | T9 | waitSandboxExit | Receives error | Error: `ttrpc: closed` | | T10 | waitSandboxExit | Sends `shim.Shutdown` request | Fails (connection already closed) | | T11 | waitSandboxExit | Closes ttrpc connection | No-op (already closed) | The containerd-shim is still running because shim.Shutdown was sent at T6 before T8. Because container's state dir is deleted at T7, it's unable to clean it up after containerd restarted. We should avoid concurrent task.Delete calls here. I also add subcommand - shutdown - in `ctr shim` for debug. Fixed: containerd#12344 Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 2042e80) Signed-off-by: Wei Fu <fuweid89@gmail.com>

containerd 2.0.7 Welcome to the v2.0.7 release of containerd! The seventh patch release for containerd 2.0 includes various bug fixes and updates. * **containerd** * [**GHSA-pwhc-rpq9-4c8w**](GHSA-pwhc-rpq9-4c8w) * [**GHSA-m6hq-p25p-ffr2**](GHSA-m6hq-p25p-ffr2) * **runc** * [**GHSA-qw9x-cqr3-wc7r**](GHSA-qw9x-cqr3-wc7r) * [**GHSA-cgrx-mc8f-2prm**](GHSA-cgrx-mc8f-2prm) * [**GHSA-9493-h29p-rfm2**](GHSA-9493-h29p-rfm2) * **Disable event subscriber during task cleanup** ([containerd#12406](containerd#12406)) * **Add SystemdCgroup to default runtime options** ([containerd#12254](containerd#12254)) * **Fix userns with container image VOLUME mounts that need copy** ([containerd#12241](containerd#12241)) * **Add dial timeout field to hosts toml configuration** ([containerd#12136](containerd#12136)) * **Update runc binary to v1.3.3** ([containerd#12479](containerd#12479)) * **Fix lost container logs from quickly closing io** ([containerd#12376](containerd#12376)) * **Create bootstrap.json with 0644 permission** ([containerd#12184](containerd#12184)) * **Fix pidfd leak in UnshareAfterEnterUserns** ([containerd#12178](containerd#12178)) Please try out the release binaries and report any issues at https://github.com/containerd/containerd/issues. * Austin Vazquez * Phil Estes * Rodrigo Campos * Wei Fu * Akihiro Suda * Derek McGowan * Maksym Pavlenko * ningmingxiao * Kirtana Ashok * Akhil Mohan * Andrew Halaney * Jin Dong * Jose Fernandez * Mike Baynton * Philip Laine * Swagat Bora * wheat2018 <details><summary>56 commits</summary> <p> * Prepare release notes for v2.0.7 ([containerd#12482](containerd#12482)) * [`4931e24f1`](containerd@4931e24) Prepare release notes for v2.0.7 * [`205bc4f2d`](containerd@205bc4f) Update mailmap * [`5f708b76a`](containerd@5f708b7) Merge commit from fork * [`8cd112d82`](containerd@8cd112d) Fix directory permissions * [`05290b5bc`](containerd@05290b5) Merge commit from fork * [`4d1edf4ad`](containerd@4d1edf4) fix goroutine leak of container Attach * Update runc binary to v1.3.3 ([containerd#12479](containerd#12479)) * [`b46dc6a67`](containerd@b46dc6a) runc: Update runc binary to v1.3.3 * ci: bump Go 1.24.9; 1.25.3 ([containerd#12361](containerd#12361)) * [`5e9c82178`](containerd@5e9c821) Update GHA runners to use latest images for basic binaries build * [`7f59248dc`](containerd@7f59248) Update GHA runners to use latest image for most jobs * [`e1373e8a8`](containerd@e1373e8) ci: bump Go 1.24.9, 1.25.3 * [`e1a910a6a`](containerd@e1a910a) ci: bump Go 1.24.8; 1.25.2 * [`fd04b7f17`](containerd@fd04b7f) move exclude-dirs to issues.exclude-dirs * [`b49377975`](containerd@b493779) update golangci-lint to v1.64.2 * [`6e45022a1`](containerd@6e45022) build(deps): bump golangci/golangci-lint-action from 6.3.2 to 6.5.0 * [`09ce0f2a1`](containerd@09ce0f2) build(deps): bump golangci/golangci-lint-action from 6.2.0 to 6.3.2 * [`de63a740b`](containerd@de63a74) build(deps): bump golangci/golangci-lint-action from 6.1.1 to 6.2.0 * Fix lost container logs from quickly closing io ([containerd#12376](containerd#12376)) * [`f953ee8a3`](containerd@f953ee8) bugfix:fix container logs lost because io close too quickly * CI: update Fedora to 43 ([containerd#12448](containerd#12448)) * [`f6f15f513`](containerd@f6f15f5) CI: update Fedora to 43 * Disable event subscriber during task cleanup ([containerd#12406](containerd#12406)) * [`2a2329cbd`](containerd@2a2329c) cri/server/podsandbox: disable event subscriber * CI: skip ubuntu-24.04-arm on private repos ([containerd#12428](containerd#12428)) * [`dfb954743`](containerd@dfb9547) CI: skip ubuntu-24.04-arm on private repos * Remove additional fuzzers from instrumentation repo ([containerd#12420](containerd#12420)) * [`f6b02f6bb`](containerd@f6b02f6) Remove additional fuzzers from CI * runc:Update runc binary to v1.3.1 ([containerd#12275](containerd#12275)) * [`75c13ee3f`](containerd@75c13ee) runc:Update runc binary to v1.3.1 * Add SystemdCgroup to default runtime options ([containerd#12254](containerd#12254)) * [`427cdd06c`](containerd@427cdd0) add SystemdCgroup to default runtime options * install-runhcs-shim: fetch target commit instead of tags ([containerd#12255](containerd#12255)) * [`0b35e19fb`](containerd@0b35e19) install-runhcs-shim: fetch target commit instead of tags * Fix userns with container image VOLUME mounts that need copy ([containerd#12241](containerd#12241)) * [`3212afc2f`](containerd@3212afc) integration: Add test for directives with userns * [`b855c6e10`](containerd@b855c6e) cri: Fix userns with Dockerfile VOLUME mounts that need copy * Fix overlayfs issues related to user namespace ([containerd#12223](containerd#12223)) * [`05c0c99f4`](containerd@05c0c99) core/mount: Retry unmounting idmapped directories * [`afdede4ce`](containerd@afdede4) core/mount: Test cleanup of DoPrepareIDMappedOverlay() * [`47205f814`](containerd@47205f8) core/mount: Properly cleanup on doPrepareIDMappedOverlay errors * [`6f4abd970`](containerd@6f4abd9) core/mount: Don't call nil function on errors * [`a2f0d65d7`](containerd@a2f0d65) core/mount: Only idmap once per overlayfs, not per layer * [`1c32accd7`](containerd@1c32acc) Make ovl idmap mounts read-only * ci: bump Go 1.23.12, 1.24.6 ([containerd#12187](containerd#12187)) * [`9e72e91e6`](containerd@9e72e91) ci: bump Go 1.23.12, 1.24.6 * Create bootstrap.json with 0644 permission ([containerd#12184](containerd#12184)) * [`009622e04`](containerd@009622e) fix: create bootstrap.json with 0644 permission * Fix pidfd leak in UnshareAfterEnterUserns ([containerd#12178](containerd#12178)) * [`5bec0a332`](containerd@5bec0a3) sys: fix pidfd leak in UnshareAfterEnterUserns * Fix windows test failures ([containerd#12120](containerd#12120)) * [`2a2488131`](containerd@2a24881) Fix intermittent test failures on Windows CIs * [`018470948`](containerd@0184709) Remove WS2025 from CIs due to regression * Add dial timeout field to hosts toml configuration ([containerd#12136](containerd#12136)) * [`b50cbbc98`](containerd@b50cbbc) Add dial timeout field to hosts toml configuration </p> </details> This release has no dependency changes Previous release can be found at [v2.0.6](https://github.com/containerd/containerd/releases/tag/v2.0.6) * `containerd-<VERSION>-<OS>-<ARCH>.tar.gz`: ✅Recommended. Dynamically linked with glibc 2.31 (Ubuntu 20.04). * `containerd-static-<VERSION>-<OS>-<ARCH>.tar.gz`: Statically linked. Expected to be used on non-glibc Linux distributions. Not position-independent. In addition to containerd, typically you will have to install [runc](https://github.com/opencontainers/runc/releases) and [CNI plugins](https://github.com/containernetworking/plugins/releases) from their official sites too. See also the [Getting Started](https://github.com/containerd/containerd/blob/main/docs/getting-started.md) documentation.

github-project-automation bot added this to Pull Request Review Oct 24, 2025

github-project-automation bot moved this to Needs Triage in Pull Request Review Oct 24, 2025

k8s-ci-robot added the size/L label Oct 24, 2025

dosubot bot added area/cri Container Runtime Interface (CRI) kind/bug labels Oct 24, 2025

fuweid force-pushed the weifu/backport-12400 branch from 1d3eb58 to 33d9934 Compare October 24, 2025 21:16

henry118 approved these changes Oct 27, 2025

View reviewed changes

fuweid force-pushed the weifu/backport-12400 branch from 33d9934 to 2a2329c Compare October 28, 2025 00:57

estesp approved these changes Oct 28, 2025

View reviewed changes

github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Oct 28, 2025

estesp merged commit 7a3fa52 into containerd:release/2.0 Oct 28, 2025
96 of 110 checks passed

github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Oct 28, 2025

austinvazquez mentioned this pull request Nov 5, 2025

[release/2.0] Prepare release notes for v2.0.7 #12482

Merged

dmcgowan added the impact/changelog label Nov 5, 2025

dmcgowan changed the title ~~[release/2.0] cri/server/podsandbox: disable event subscriber~~ [release/2.0] Disable event subscriber during task cleanup Nov 5, 2025

KCSesh mentioned this pull request Nov 7, 2025

Update containerd [1.7/2.0/2.1] verisons to the latest bottlerocket-os/bottlerocket-core-kit#724

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release/2.0] Disable event subscriber during task cleanup #12406

[release/2.0] Disable event subscriber during task cleanup #12406

Uh oh!

fuweid commented Oct 24, 2025 •

edited

Loading

Uh oh!

henry118 commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[release/2.0] Disable event subscriber during task cleanup #12406

[release/2.0] Disable event subscriber during task cleanup #12406

Uh oh!

Conversation

fuweid commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henry118 commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fuweid commented Oct 24, 2025 •

edited

Loading