service: Fix wrong localEndpoints count in HealthCheckNodePort#11863
Merged
service: Fix wrong localEndpoints count in HealthCheckNodePort#11863
Conversation
This fixes an issue with the `HealthCheckNodePort` server where it would non-deterministically sometimes return a non-zero `localEndpoints` count on nodes which do not have local endpoints. Because Cilium internally creates a service object per frontend IP, we end up with multiple services sharing the same name. In the case where a `LoadBalancer` service has `externalTrafficPolicy=Local` with no local backends, Cilium will still create a `ClusterIP` sibling service which retains the non-local backends. In that case, we must take care to not incooperate the `ClusterIP` backends into the `localEndpoints` count intended for external traffic. The final count is dependent on the order in which services are added to the service manager, which explains why the occurence of this bug was non-deterministic. This commit fixes this issue by checking that the service may only contain local backends before its count is added to the `HealthCheckNodePort` server. The unit tests are adapated as well and try to emulate the way the K8s watcher upserts services in the service manager. Fixes: #11043 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Member
Author
|
test-me-please |
aanm
approved these changes
Jun 3, 2020
Member
|
I attempted to backport this PR, but ran into non-trivial conflicts. It seems like it depends on #11085. Even though this PR was cherry-picked cleanly (except for the Edit: To clarify, the conflicts are in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fixes an issue with the
HealthCheckNodePortserver where itwould non-deterministically sometimes return a non-zero
localEndpointscount on nodes which do not have local endpoints.Because Cilium internally creates a service object per frontend IP, we
end up with multiple services sharing the same name. In the case where
a
LoadBalancerservice hasexternalTrafficPolicy=Localwith nolocal backends, Cilium will still create a
ClusterIPsibling servicewhich retains the non-local backends. In that case, we must take care
to not incooperate the
ClusterIPbackends into thelocalEndpointscount intended for external traffic. The final count is dependent on
the order in which services are added to the service manager, which
explains why the occurence of this bug was non-deterministic.
This commit fixes this issue by checking that the service may only
contain local backends before its count is added to the
HealthCheckNodePortserver. The unit tests are adapated as well andtry to emulate the way the K8s watcher upserts services in the service
manager.
Fixes: #11043
Signed-off-by: Sebastian Wicki sebastian@isovalent.com