Skip to content

Conversation

@klihub
Copy link
Member

@klihub klihub commented Apr 8, 2025

Let the stub determine which events the logger plugin subscribes to, to fix a recently introduced startup failure.

Ever since support for the new pod update events has been merged, unless the event subscription mask is overidden with the --events command line option, the sample logger plugin fails to start up with the following error message:

plugin exited with error internal error: unhandled events UpdatePodSandbox,PostUpdatePodSandbox (0x1800)

This patch should fix that.

@klihub klihub requested review from fuweid and mikebrow April 8, 2025 14:52
@klihub klihub force-pushed the fixes/logger-default-event-mask branch from 54cf552 to 0f9463d Compare April 8, 2025 14:54
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all is still problematic because it filters down to all previously valid/known

maybe add a known option to replace all where that filters out by version / event types we've decided to register for and monitor...

then can make all mean any pod/container ... perhaps later on pod/container/image?

@mikebrow
Copy link
Member

mikebrow commented Apr 8, 2025

should probably add the new pod events

@klihub
Copy link
Member Author

klihub commented Apr 9, 2025

should probably add the new pod events

My plan is to add them once the runtimes start supporting them. Otherwise even with this change in place, we would still fail to register to the runtime, by default, it just would fail on the other end of the ttrpc transport, because those new events are unknown to the runtime yet.

@klihub
Copy link
Member Author

klihub commented Apr 9, 2025

all is still problematic because it filters down to all previously valid/known

maybe add a known option to replace all where that filters out by version / event types we've decided to register for and monitor...

But, if you refer to "all" in

case "all":
, that is a helper function in the API/protocol description, and the idea is that it represents the event mask of all known subscribable NRI events.

Sorry, I don't understand what you mean by filtering out by version. The plugin is compiled against the package, so it is talking/using the same version as the version of the NRI package itself. So the notion of "all" should be the same for the plugin and the NRI package. On the stub side, by convention, using a 0 event mask will result in the stub registering with an event mask for all events the plugin has a handler for. That's what should have been used in the logger plugin by default all along, instead of the default being "all". And that's what this PR aims to fix.

@klihub klihub requested a review from mikebrow April 9, 2025 14:10
@mikebrow
Copy link
Member

mikebrow commented Apr 9, 2025

all is still problematic because it filters down to all previously valid/known
maybe add a known option to replace all where that filters out by version / event types we've decided to register for and monitor...

But, if you refer to "all" in

case "all":

, that is a helper function in the API/protocol description, and the idea is that it represents the event mask of all known subscribable NRI events.
Sorry, I don't understand what you mean by filtering out by version. The plugin is compiled against the package, so it is talking/using the same version as the version of the NRI package itself. So the notion of "all" should be the same for the plugin and the NRI package. On the stub side, by convention, using a 0 event mask will result in the stub registering with an event mask for all events the plugin has a handler for. That's what should have been used in the logger plugin by default all along, instead of the default being "all". And that's what this PR aims to fix.

interesting... so nil is actually all need to read through it better... I caught that all was a mask on known and figured that would need to be versioned... do we block/force NRI plugins to the specific version of containerd/crio vendor import for nri?

I was thinking there was value in the all "known" mask and that each version of NRI will have a set of known

@klihub
Copy link
Member Author

klihub commented Apr 10, 2025

all is still problematic because it filters down to all previously valid/known
maybe add a known option to replace all where that filters out by version / event types we've decided to register for and monitor...

But, if you refer to "all" in

case "all":

, that is a helper function in the API/protocol description, and the idea is that it represents the event mask of all known subscribable NRI events.
Sorry, I don't understand what you mean by filtering out by version. The plugin is compiled against the package, so it is talking/using the same version as the version of the NRI package itself. So the notion of "all" should be the same for the plugin and the NRI package. On the stub side, by convention, using a 0 event mask will result in the stub registering with an event mask for all events the plugin has a handler for. That's what should have been used in the logger plugin by default all along, instead of the default being "all". And that's what this PR aims to fix.

interesting... so nil is actually all need to read through it better... I caught that all was a mask on known and figured that would need to be versioned... do we block/force NRI plugins to the specific version of containerd/crio vendor import for nri?

No, we don't check or block it at the moment. But we reject the plugin if it tries to subscribe to events the runtime does not know about.

I was thinking there was value in the all "known" mask and that each version of NRI will have a set of known

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Let the stub determine which events the logger plugin subscribes
to. This should fix startup errors about trying to subscribe to
{Post,}UpdatePodSandbox events without handlers. Once the runtimes
start delivering those events we can add handlers for them and
that will get us subscribed to them automatically.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
@klihub klihub force-pushed the fixes/logger-default-event-mask branch from 0f9463d to 33b1db1 Compare June 11, 2025 15:45
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marquiz
Copy link
Contributor

marquiz commented Aug 8, 2025

ping @mikebrow @saschagrunert

@marquiz
Copy link
Contributor

marquiz commented Aug 11, 2025

We need this now that logger tries to register to "all" that contains events that no runtime supports, yet

@klihub
Copy link
Member Author

klihub commented Aug 12, 2025

We need this now that logger tries to register to "all" that contains events that no runtime supports, yet

This won't fix that now. Meanwhile handlers for the new events have been added to the logger. And this patch simply changes the behavior so that the logger tries to register for all events which it implements a handler for.

@marquiz
Copy link
Contributor

marquiz commented Aug 12, 2025

This won't fix that now.

@klihub's correct. Sorry for the noise

@klihub
Copy link
Member Author

klihub commented Aug 12, 2025

This won't fix that now.

@klihub's correct. Sorry for the noise

No prob. I already forgot so it was not that obvious to me either.

@mikebrow mikebrow removed the request for review from fuweid August 13, 2025 18:25
@klihub klihub requested a review from chrishenzie August 14, 2025 08:03
Copy link
Contributor

@chrishenzie chrishenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to leave "all" around in the event mask parser?
https://github.com/containerd/nri/blob/v0.9.0/pkg/api/event.go#L80

I'm assuming yes since this is exported and we wouldn't want to change this behavior, but just checking

@klihub
Copy link
Member Author

klihub commented Aug 14, 2025

Do we still want to leave "all" around in the event mask parser? https://github.com/containerd/nri/blob/v0.9.0/pkg/api/event.go#L80

I'm assuming yes since this is exported and we wouldn't want to change this behavior, but just checking

Yes, I think it still makes sense. Semantically it means 'register me for all known events (by this version of NRI)' and fail if I fail to implement a handler for any of them. Whereas 0 (IOW omitting to set it), what this PR switches the logger to use, is semantically 'let the stub check which NRI events I implement a handler for and use that for registration'. In a normal plugin, 0/not setting it is what I normally go with.

@marquiz
Copy link
Contributor

marquiz commented Aug 15, 2025

Yeah, all is still useful IMO

Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@klihub klihub merged commit b64a64d into containerd:main Sep 30, 2025
9 checks passed
@klihub klihub deleted the fixes/logger-default-event-mask branch September 30, 2025 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants