stop closed plugins that will be removed by Iceber · Pull Request #89 · containerd/nri

Iceber · 2024-06-07T09:24:01Z

Most of Adaptation's methods call removeClosedPlugins to remove closed plugins, which are moved out of the r.plugins list and never operated on again.

It may remain a zombie process on the system, and using some tools it will look like the plugin is running normally.

# pstree
...
        ├─containerd─┬─03-logger
        │            └─15*[{containerd}]

This pr causes removeClosedPlugins to call plugin.stop when removing closed plugins, which avoids these zombie processes.

Signed-off-by: Iceber Gu <caiwei95@hotmail.com>

klihub · 2024-06-07T14:15:09Z

pkg/adaptation/adaptation.go

+	if len(closed) != 0 {
+		go func() {
+			for _, plugin := range closed {
+				plugin.stop()


This changes slightly the semantics of setting an on-close handler on the plugin side with stub.WithOnClose(), although I am not sure if for pre-launched plugins the current behavior should be considered a feature or a bug.

Anyway, the current semantics both for external and pre-launched plugins is that if the connection is closed, the stub internally os.Exit(0)s if the plugin has not set an OnClose handler of its own. But if the plugin has set a handler, the stub calls it and does not exit, (perhaps naively) trusting the plugin that it will do the right thing, IOW do any extra stuff it wants/needs and then always exit if it was a pre-launched plugin.

With this change in place in it's current form, we don't do a single thing to give the stub a chance to call any potential OnClose handler of the plugin when it is a pre-launched one. If we had to chose the lesser evil, IOW accumulating zombies vs. butchering the plugin, I think this is the right thing to do.

But I was thinking that maybe we could give the plugin some slack here before we stop/kill it, since we're running that asynchronously anyway. That still would not guarantee that plugin's handler would be run, but it would give it a better chance. Unfortunately I think if we want to do any better than that, then we'd need to make this thing explicitly controllable through the API (define a default timeout for pre-launched plugins to let their OnClose handler run and make it configurable with a new stub.Option)... but it would also require extra communication (probably in one of the plugin registration or configuration messages) since the option is set on the stub side but should take effect in the adaptation on the runtime side. So I would not go there now, unless we know for a fact that there are plugins used as pre-launched ones which heavily rely on the current (implicit) semantics...

Now I assume that you came upon this observed zombie behavior, because you have a plugin which sets an OnClose handler but then it does not exit, right ?

suggest test case(s) for the example plugins validating a plugin's on close func ...

agree with your point, @klihub, that this should be configurable for each plugin... Maybe add a ctx with timeout created based on config (or a default for now say two seconds grace), you'd create that timeout ctx in the WithOnClose func and pass the ctx into their onClose func that they requested..

Maybe WithOnCloseExit and deprecate the older one add a notice about the issue...

I'm sorry I overlooked WithOnClose, we do need to tolerate the behavior of the plugin when it's closed.

This pr may need to be updated, but even with WithOnClose, the zombie process still exists because the adaptation doesn't perform a reclaim operation(process.Wait) after the plugin exits.

I'm sorry I overlooked WithOnClose, we do need to tolerate the behavior of the plugin when it's closed.

This pr may need to be updated, but even with WithOnClose, the zombie process still exists because the adaptation doesn't perform a reclaim operation(process.Wait) after the plugin exits.

True, we can take care of giving the plugin some slack in another PR.

klihub

LGTM. Let's take care of killing-after-some-slack in another PR.

stop closed plugins that will be removed

ba398fa

Signed-off-by: Iceber Gu <caiwei95@hotmail.com>

klihub reviewed Jun 7, 2024

View reviewed changes

klihub self-requested a review July 8, 2024 08:07

klihub approved these changes Jul 8, 2024

View reviewed changes

klihub requested review from fuweid and mikebrow July 8, 2024 08:59

fuweid approved these changes Jul 11, 2024

View reviewed changes

fuweid merged commit 5a4c86a into containerd:main Jul 11, 2024

austinvazquez mentioned this pull request Feb 13, 2025

[release/1.7] Prepare release notes for v1.7.26 containerd/containerd#11356

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stop closed plugins that will be removed#89

stop closed plugins that will be removed#89
fuweid merged 1 commit intocontainerd:mainfrom
Iceber:stop_closed_plugins

Iceber commented Jun 7, 2024

Uh oh!

klihub Jun 7, 2024 •

edited

Loading

Uh oh!

mikebrow Jun 7, 2024

Uh oh!

mikebrow Jun 7, 2024

Uh oh!

Iceber Jun 12, 2024

Uh oh!

klihub Jul 8, 2024

Uh oh!

klihub left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Iceber commented Jun 7, 2024

Uh oh!

klihub Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikebrow Jun 7, 2024

Choose a reason for hiding this comment

Uh oh!

mikebrow Jun 7, 2024

Choose a reason for hiding this comment

Uh oh!

Iceber Jun 12, 2024

Choose a reason for hiding this comment

Uh oh!

klihub Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

klihub left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

klihub Jun 7, 2024 •

edited

Loading