server: tolerate failed gRPC plugins when starting listeners#13363
Conversation
The grpc, grpc-tcp, and ttrpc server plugins enumerated their services through ic.GetByType, which short-circuits on the first plugin whose Instance() returned an error. A single failed gRPC plugin (e.g. CRI under rootless, which cannot watch /etc/cni/net.d) therefore prevented the server plugins from initialising, leaving /run/containerd/containerd.sock uncreated. Iterate the plugin set directly and skip plugins that failed to initialise, restoring the pre-c15ec2485 behaviour where the listener is still created and only the failed services are missing. Fixes: c15ec24 ("Add server plugins for grpc and ttrpc") Fixes: containerd#13362 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
There was a problem hiding this comment.
Pull request overview
This PR restores tolerant listener startup behavior for the gRPC and TTRPC server plugins by avoiding InitContext.GetByType(...), which aborts service enumeration on the first plugin that failed initialization. As a result, the main gRPC listener (e.g. /run/containerd/containerd.sock) can still be created even when some gRPC service plugins (like CRI in certain rootless setups) fail to initialize.
Changes:
- Update gRPC server plugin to iterate the full plugin set and skip plugins whose
Instance()returned an error, instead of failing server initialization. - Apply the same tolerant iteration approach to the gRPC TCP server plugin and the TTRPC server plugin.
- Remove now-unused
errorsimports from the affected files.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| plugins/server/grpc/plugin.go | Enumerates gRPC service plugins without short-circuiting on initialization errors, allowing the gRPC/grpc-tcp listeners to still start. |
| plugins/server/ttrpc/plugin.go | Enumerates TTRPC/GRPC plugins directly and skips failed instances so the TTRPC server can still start when some services fail. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cpuguy83
left a comment
There was a problem hiding this comment.
This seems fine to fix the problem, but I'm wondering if we can add a new getter that is an iterator where the caller can choose what to do.
Yeah, we should have a way to handle that. The expected way to enforce plugin availability is the required plugins field, not erroring in places like this. |
|
/cherry-pick release/2.3 |
|
@AkihiroSuda: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@AkihiroSuda: new pull request created: #13390 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The grpc, grpc-tcp, and ttrpc server plugins enumerated their services through ic.GetByType, which short-circuits on the first plugin whose Instance() returned an error. A single failed gRPC plugin (e.g. CRI under rootless, which cannot watch /etc/cni/net.d) therefore prevented the server plugins from initialising, leaving /run/containerd/containerd.sock uncreated.
Iterate the plugin set directly and skip plugins that failed to initialise, restoring the pre-c15ec2485 behaviour where the listener is still created and only the failed services are missing.
Fixes: c15ec24 ("Add server plugins for grpc and ttrpc")
Fixes: #13362
Note: used Claude Code