cri/server/podsandbox: disable event subscriber #12400

fuweid · 2025-10-23T22:30:42Z

We have individual goroutine for each sandbox container. If there is any
error in handler, that goroutine will put event in that backoff queue.
So we don't need event subscriber for podsandbox. Otherwise, there will
be two goroutines to cleanup sandbox container.

>>>> From EventMonitor
  time="2025-10-23T19:30:59.626254404Z" level=debug msg="Received containerd event timestamp - 2025-10-23 19:30:59.624494674 +0000 UTC, namespace - \"k8s.io\", topic - \"/tasks/exit\""
  time="2025-10-23T19:30:59.626301912Z" level=debug msg="TaskExit event in podsandbox handler container_id:\"22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf\" id:\"22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf\" pid:203121 exit_status:137 exited_at:{seconds:1761247859 nanos:624467824}"

>>> If EventMonitor handles task exit well, it will close ttrpc
connection and then waitSandboxExit could encounter ttrpc-closed error

  time="2025-10-23T19:30:59.688031150Z" level=error msg="failed to delete task" error="ttrpc: closed" id=22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf

If both task.Delete calls fail but the shim has already been shut down, it
could trigger a new task.Exit event sent by cleanupAfterDeadShim. This would
result in three events in the EventMonitor's backoff queue, which is unnecessary
and could cause confusion due to duplicate events.

The worst-case scenario caused by two concurrent task.Delete calls is a shim
leak. The timeline for this scenario is as follows:

Timestamp	Component	Action	Result
T1	EventMonitor	Sends `task.Delete`	Marked as Req-1
T2	waitSandboxExit	Sends `task.Delete`	Marked as Req-2
T3	containerd-shim	Handles Req-2	Container transitions from stopped to deleted
T4	containerd-shim	Handles Req-1	Fails - container already deleted Returns error: `cannot delete a deleted process: not found`
T5	EventMonitor	Receives `not found` error	-
T6	EventMonitor	Sends `shim.Shutdown` request	No-op (active container record still exists)
T7	EventMonitor	Closes ttrpc connection	Clean container state dir
T8	containerd-shim	Handles Req-2	Removes container record from memory
T9	waitSandboxExit	Receives error	Error: `ttrpc: closed`
T10	waitSandboxExit	Sends `shim.Shutdown` request	Fails (connection already closed)
T11	waitSandboxExit	Closes ttrpc connection	No-op (already closed)

The containerd-shim is still running because shim.Shutdown was sent at T6
before T8. Because container's state dir is deleted at T7, it's unable to clean
it up after containerd restarted.

We should avoid concurrent task.Delete calls here.

I also add subcommand - shutdown - in ctr shim for debug.

Fixes: #12344
Fixes: #12404

Signed-off-by: Wei Fu fuweid89@gmail.com

fuweid · 2025-10-23T22:31:10Z

cc @yylt @ningmingxiao

mikebrow · 2025-10-23T23:46:25Z

internal/cri/server/podsandbox/controller.go

-			eventMonitor.Start()
-			c.eventMonitor = eventMonitor
-
+			// no need to subscribe exit event for pause container


Suggested change

// no need to subscribe exit event for pause container

// no need to subscribe exit event for pause container, because ___

Thanks! Updated.

henry118 · 2025-10-24T02:02:19Z

@fuweid I have questions wrt "ttrpc: closed" error.

Does this error indicate the shim process should have exited? If so why would we get shim leaks?
If shim process can persist even after we get this error, how would we request "shutdown" if the ttrpc is closed?

Thanks!

yylt · 2025-10-24T02:57:19Z

original code doesn't set ID value

why not set ID as correct code?

In the issue #12344, there was no leak in version 1.7, so I think here would not be an root cause. And it had ran well long time.

In version 2.x, the cause of the error maybe was the lack of info when delete task and shim, which resulted in the RPC not being sent. And also it's reason I submitted PR 12397

fuweid · 2025-10-24T04:13:32Z

@henry118 @yylt I need to run that test tonight to confirm that change is valid.

The root cause is about two concurrrent task deletions. It's mentioned by @ningmingxiao before.
We have two goroutines to handle sandbox task exit event. It's trigger to have leaky shim.
So I remove that subscription from event monitor.

Why does two concurrent task deletions cause leaky issue?

Updated(2025-10-24): checked my pr's description.

there was no leak in version 1.7

It's introduced by podsandbox change since 2.0.

henry118 · 2025-10-24T05:02:48Z

T5: Goroutine 1 receveid Response 1 and then containerd invokes shutdown and closes that connection.

This is the part I don't quite understand. Did the shutdown succeed?

Yes -> we shouldn't have leaky shim;
No -> why Goroutine 2 get "ttrpc: closed"?

Does the shutdown ever close the client side ttrpc connection? I don't see that in the following code:

containerd/core/runtime/v2/shim.go

Lines 497 to 505 in 3d0df86

    
           func (s *shimTask) Shutdown(ctx context.Context) error { 
        
           	_, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ 
        
           		ID: s.ID(), 
        
           	}) 
        
           	if err != nil && !errors.Is(err, ttrpc.ErrClosed) { 
        
           		return errgrpc.ToNative(err) 
        
           	} 
        
           	return nil 
        
           }

henry118 · 2025-10-24T05:12:10Z

T5: Goroutine 1 receveid Response 1 and then containerd invokes shutdown and closes that connection.

This is the part I don't quite understand. Did the shutdown succeed?

Yes -> we shouldn't have leaky shim;

No -> why Goroutine 2 get "ttrpc: closed"?

Does the shutdown ever close the client side ttrpc connection? I don't see that in the following code:

I think I finally get it. The connection was closed by the following shim instance deletion:

containerd/core/runtime/v2/shim.go

Lines 566 to 578 in 3d0df86

    
           if !sandboxed { 
        
           	if err := s.waitShutdown(ctx); err != nil { 
        
           		// FIXME(fuweid): 
        
           		// 
        
           		// If the error is context canceled, should we use context.TODO() 
        
           		// to wait for it? 
        
           		log.G(ctx).WithField("id", s.ID()).WithError(err).Error("failed to shutdown shim task and the shim might be leaked") 
        
           	} 
        
           } 
        
           if err := s.ShimInstance.Delete(ctx); err != nil { 
        
           	log.G(ctx).WithField("id", s.ID()).WithError(err).Error("failed to delete shim") 
        
           }

ningmingxiao · 2025-10-24T14:59:52Z

以下填充程序实例删除而关闭：

this shoutdown never send to shim. ttrpc closed maybe shim dir is deleted @henry118
we should make sure ttrpc server realy closed

We have individual goroutine for each sandbox container. If there is any error in handler, that goroutine will put event in that backoff queue. So we don't need event subscriber for podsandbox. Otherwise, there will be two goroutines to cleanup sandbox container. ``` >>>> From EventMonitor time="2025-10-23T19:30:59.626254404Z" level=debug msg="Received containerd event timestamp - 2025-10-23 19:30:59.624494674 +0000 UTC, namespace - \"k8s.io\", topic - \"/tasks/exit\"" time="2025-10-23T19:30:59.626301912Z" level=debug msg="TaskExit event in podsandbox handler container_id:\"22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf\" id:\"22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf\" pid:203121 exit_status:137 exited_at:{seconds:1761247859 nanos:624467824}" >>> If EventMonitor handles task exit well, it will close ttrpc connection and then waitSandboxExit could encounter ttrpc-closed error time="2025-10-23T19:30:59.688031150Z" level=error msg="failed to delete task" error="ttrpc: closed" id=22e15114133e4d461ab380654fb76f3e73d3e0323989c422fa17882762979ccf ``` If both task.Delete calls fail but the shim has already been shut down, it could trigger a new task.Exit event sent by cleanupAfterDeadShim. This would result in three events in the EventMonitor's backoff queue, which is unnecessary and could cause confusion due to duplicate events. The worst-case scenario caused by two concurrent task.Delete calls is a shim leak. The timeline for this scenario is as follows: | Timestamp | Component | Action | Result | | ------ | ----------- | -------- | -------- | | T1 | EventMonitor | Sends `task.Delete` | Marked as Req-1 | | T2 | waitSandboxExit | Sends `task.Delete` | Marked as Req-2 | | T3 | containerd-shim | Handles Req-2 | Container transitions from stopped to deleted | | T4 | containerd-shim | Handles Req-1 | Fails - container already deleted<br>Returns error: `cannot delete a deleted process: not found` | | T5 | EventMonitor | Receives `not found` error | - | | T6 | EventMonitor | Sends `shim.Shutdown` request | No-op (active container record still exists) | | T7 | EventMonitor | Closes ttrpc connection | Clean container state dir | | T8 | containerd-shim | Handles Req-2 | Removes container record from memory | | T9 | waitSandboxExit | Receives error | Error: `ttrpc: closed` | | T10 | waitSandboxExit | Sends `shim.Shutdown` request | Fails (connection already closed) | | T11 | waitSandboxExit | Closes ttrpc connection | No-op (already closed) | The containerd-shim is still running because shim.Shutdown was sent at T6 before T8. Because container's state dir is deleted at T7, it's unable to clean it up after containerd restarted. We should avoid concurrent task.Delete calls here. I also add subcommand - shutdown - in `ctr shim` for debug. Fixed: containerd#12344 Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid · 2025-10-24T16:22:43Z

@henry118 @yylt @ningmingxiao I've updated the description with a timeline table. I hope this addresses your question."

henry118 · 2025-10-24T18:05:29Z

Thanks @fuweid. Good job on the investigation and explanation of the problem. It all makes sense now.

Also agreed that the task struct is not designed to be reentrant, so avoiding concurrent access to the object is a better approach.

LGTM.

mikebrow

LGTM.. like the new shutdown command..

github-project-automation bot added this to Pull Request Review Oct 23, 2025

github-project-automation bot moved this to Needs Triage in Pull Request Review Oct 23, 2025

fuweid requested a review from dmcgowan October 23, 2025 22:30

k8s-ci-robot added the size/M label Oct 23, 2025

fuweid requested review from mikebrow and mxpv October 23, 2025 22:30

dosubot bot added the area/cri Container Runtime Interface (CRI) label Oct 23, 2025

fuweid force-pushed the weifu/disable-subscriber-on-podsandbox branch from 50d29e1 to 8ad3229 Compare October 23, 2025 22:56

k8s-ci-robot added size/L and removed size/M labels Oct 23, 2025

fuweid force-pushed the weifu/disable-subscriber-on-podsandbox branch from 8ad3229 to b2bcbf3 Compare October 23, 2025 22:57

fuweid added cherry-pick/2.0.x Change to be cherry picked to release/2.0 branch cherry-pick/2.1.x Change to be cherry picked to release/2.1 branch labels Oct 23, 2025

fuweid force-pushed the weifu/disable-subscriber-on-podsandbox branch from b2bcbf3 to c84eeac Compare October 23, 2025 23:33

mikebrow reviewed Oct 23, 2025

View reviewed changes

fuweid force-pushed the weifu/disable-subscriber-on-podsandbox branch from c84eeac to 2042e80 Compare October 24, 2025 16:20

mxpv approved these changes Oct 24, 2025

View reviewed changes

henry118 approved these changes Oct 24, 2025

View reviewed changes

github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Oct 24, 2025

fuweid added this pull request to the merge queue Oct 24, 2025

dmcgowan approved these changes Oct 24, 2025

View reviewed changes

Merged via the queue into containerd:main with commit f039ce9 Oct 24, 2025
54 checks passed

github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Oct 24, 2025

mikebrow reviewed Oct 24, 2025

View reviewed changes

This was referenced Oct 24, 2025

[release/2.1] Disable event subscriber during task cleanup #12410

Merged

[release/2.0] Disable event subscriber during task cleanup #12406

Merged

ningmingxiao mentioned this pull request Nov 5, 2025

Prepare release notes for v2.2.0 #12473

Merged

fuweid mentioned this pull request Nov 26, 2025

[flakey-test] cri-integration: TestIssue7496 #9446

Closed

vietcgi mentioned this pull request Nov 28, 2025

fix: prevent sandbox leak when idIndex is out of sync (fixes #12390) #12401

Open

	// no need to subscribe exit event for pause container
	// no need to subscribe exit event for pause container, because ___

cri/server/podsandbox: disable event subscriber #12400

cri/server/podsandbox: disable event subscriber #12400

Uh oh!

Conversation

fuweid commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuweid commented Oct 23, 2025

Uh oh!

mikebrow Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

fuweid Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

henry118 commented Oct 24, 2025

Uh oh!

yylt commented Oct 24, 2025

Uh oh!

fuweid commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henry118 commented Oct 24, 2025

Uh oh!

henry118 commented Oct 24, 2025

Uh oh!

ningmingxiao commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuweid commented Oct 24, 2025

Uh oh!

henry118 commented Oct 24, 2025

Uh oh!

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

fuweid commented Oct 23, 2025 •

edited

Loading

fuweid commented Oct 24, 2025 •

edited

Loading

ningmingxiao commented Oct 24, 2025 •

edited

Loading