Skip to content

Fix bug causing deleted endpoints to not be updated#9805

Merged
mandarjog merged 2 commits intoistio:release-1.0from
costinm:104-eds
Nov 9, 2018
Merged

Fix bug causing deleted endpoints to not be updated#9805
mandarjog merged 2 commits intoistio:release-1.0from
costinm:104-eds

Conversation

@costinm
Copy link
Copy Markdown
Contributor

@costinm costinm commented Nov 8, 2018

The problem was that the 'shard' ID was not matching between incremental updates and full updates.

The 'shard' represents a registry (or in future a subset, for large registries). The registries are stored
in the aggreage map, keyed by the registry name. The bug was that the name for the default registry
was "" when constructing the registry, and "kubernetes" when adding. Adding was ok, and delete after a full push was ok too - but on incremental push, deletes were only deleting the "" shard.

Additional cleanups on 1.1/master, as we convert the other 2 registries to incremental.

@hzxuzhonghu
Copy link
Copy Markdown
Member

if cluster == "" {

Should change this too?

@costinm
Copy link
Copy Markdown
Contributor Author

costinm commented Nov 8, 2018

@hzxuzhonghu great catch, I'm impressed !
But it is not needed - the index of hostname to port map is built and maintained on
initPushContext -> initServiceRegistry, and SvcUpdate is currently followed by a full push request,
since we can't yet handle Service changes incrementally.

I'll add more comments on that method and fix it - it is intended as a start on supporting
on-demand LDS/CDS. A new Service showing up currently requires a full LDS recalculation,
but with on-demand we can use SvcUpdate to figure out if any workload is importing that
name.

@mandarjog
Copy link
Copy Markdown
Contributor

Good sleuthing. can we add some lower level tests in this area? Specifically why eds unit tests did not catch this?

@nmittler
Copy link
Copy Markdown
Contributor

nmittler commented Nov 8, 2018

@mandarjog I'm working on reproducing it in an integration test. Unit tests didn't catch it since the bug was in the pilot bootstrap logic.

func (s *Server) createK8sServiceControllers(serviceControllers *aggregate.Controller, args *PilotArgs) (err error) {
clusterID := string(serviceregistry.KubernetesRegistry)
log.Infof("Primary Cluster name: %s", clusterID)
args.Config.ControllerOptions.ClusterID = string(serviceregistry.KubernetesRegistry)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just use clusterID

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 8, 2018

Codecov Report

Merging #9805 into release-1.0 will decrease coverage by 1%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           release-1.0   #9805    +/-   ##
============================================
- Coverage           72%     72%   -<1%     
============================================
  Files              366     366            
  Lines            32404   32136   -268     
============================================
- Hits             23223   22973   -250     
+ Misses            8194    8189     -5     
+ Partials           987     974    -13
Impacted Files Coverage Δ
...olarwinds/internal/papertrail/papertrail_logger.go 53% <0%> (-27%) ⬇️
pilot/pkg/config/memory/monitor.go 82% <0%> (-9%) ⬇️
mixer/adapter/signalfx/cumulative.go 67% <0%> (-4%) ⬇️
mixer/adapter/redisquota/redisquota.go 87% <0%> (-3%) ⬇️
mixer/adapter/servicecontrol/quotaprocessor.go 80% <0%> (-2%) ⬇️
mixer/adapter/fluentd/fluentd.go 74% <0%> (-1%) ⬇️
mixer/adapter/servicecontrol/servicecontrol.go 62% <0%> (ø) ⬇️
mixer/adapter/opa/opa.go 80% <0%> (ø) ⬇️
mixer/adapter/stdio/zap.go 100% <0%> (ø) ⬆️
mixer/adapter/list/regexList.go 100% <0%> (ø) ⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 25c36c4...7bd9660. Read the comment docs.

@nmittler
Copy link
Copy Markdown
Contributor

nmittler commented Nov 8, 2018

/lgtm
/approve

@istio-testing
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: costinm, nmittler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@istio-testing
Copy link
Copy Markdown
Collaborator

@costinm: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
prow/istio-unit-tests.sh 7bd9660 link /test istio-unit-tests
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Copy Markdown
Contributor

@mandarjog mandarjog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mandarjog mandarjog merged commit aa4ea66 into istio:release-1.0 Nov 9, 2018
@nmittler
Copy link
Copy Markdown
Contributor

nmittler commented Nov 9, 2018

@costinm just to be clear, you were able to reproduce the problem on k8s?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants