loadbalancer: Fix GetInstancesOfService to avoid removing endpoint from Service A cause requests to Service B fail if the name of Service A is the prefix of Service B#43620
Conversation
|
The Could you look into extending the tests so we have a regression test for this? Perhaps as unit test in https://github.com/cilium/cilium/blob/main/pkg/loadbalancer/writer/writer_test.go ( |
|
/test |
joamaki
left a comment
There was a problem hiding this comment.
Let's add regression test to this
|
I've changed Since only the |
The current `GetInstancesOfService` function returns all Services that begin with the name, which can lead to removing an endpoint from Service A's EndpointSlice will cause all requests to Service B to fail (If the name of Service A is the prefix of Service B). This patch will fix the matching logic of GetInstancesOfService, ensuring an exact match for the service. Fixes: cilium#43619 Signed-off-by: roc <roc@imroc.cc>
|
/test |
|
FYI: I am an engineer from the TKE (Tencent Kubernetes Engine) team. The original issue was that after installing Cilium on TKE, if the cluster specifications were adjusted automatically or manually (triggering an apiserver rolling update), almost all Cilium components in the cluster would become unresponsive and unable to reconnect to the apiserver. The root cause of the problem was eventually traced to this issue. The detailed troubleshooting process can be found at here(translated by AI): https://imroc.cc/tke/en/networking/cilium/troubleshooting/connect-apiserver-operation-not-permitted |
|
/test |
The current
GetInstancesOfServicefunction returns all Services that prefix with the specified name, which can lead to removing endpoint from Service A cause requests to Service B fail if the name of Service A is the prefix of Service B (#43619).This patch will fix the matching logic of GetInstancesOfService, ensuring an exact match for the service.
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXXline if the commit addresses a particularGitHub issue.
Fixes: <commit-id>tag, thenplease add the commit author[s] as reviewer[s] to this issue.
When an endpoint is removed from EndpointSlice, it will proceed to the
backendReleasefunction to release the corresponding backend:Let's take a look at the
GetInstancesOfService:The returned []byte of
BackendInstanceKey.Keyalways starts with the service name.It means that Service B will be matched if its' name starts with Service A when GetInstancesOfService(Service A).
This is also the root cause described in #43619
This PR fixes the
GetInstancesOfService, use the exact match for service instead of prefix match.Fixes: #43619