Skip to content

Incremental MCP for SyntheticServiceEntries#12276

Merged
istio-testing merged 1 commit intoistio:masterfrom
Nino-K:incremental-eds
Oct 16, 2019
Merged

Incremental MCP for SyntheticServiceEntries#12276
istio-testing merged 1 commit intoistio:masterfrom
Nino-K:incremental-eds

Conversation

@Nino-K
Copy link
Copy Markdown
Member

@Nino-K Nino-K commented Mar 6, 2019

This PR introduces the following:

  • New EDS or full config update behaviour depending on changes in service or endpoints
  • Enables incremental MCP in coredatamodel (only for syntheticServiceEntries that received via Galley).
  • Supports annotations (serviceVersion & endpointsVersion) received via MCP (only Galley) . These annotations are used to determine partial update or full config update to envoy.
  • Handles NotReadyEndpoints annotations.
  • Ability to expose NotReadyEndpoints to aggregated controller

@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Mar 6, 2019
@istio-testing istio-testing requested review from ayj and costinm March 6, 2019 01:03
@Nino-K Nino-K changed the title [WIP] Incremental EDS updates Incremental EDS updates Mar 13, 2019
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Mar 13, 2019
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 29, 2019
@stale
Copy link
Copy Markdown

stale bot commented Apr 12, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 2 weeks. It will be closed in 30 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale label Apr 12, 2019
@stale stale bot removed the stale label Apr 17, 2019
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Apr 17, 2019
@hzxuzhonghu
Copy link
Copy Markdown
Member

hzxuzhonghu commented Oct 16, 2019

There are other two questions left:

  1. Why notreadyEndpoints Discovery interface, what it is for?

  2. The Istio service is not well populated, at least lack of several important fields, so i think it should cause some regression.

@Nino-K
Copy link
Copy Markdown
Member Author

Nino-K commented Oct 16, 2019

@Nino-K Please donot squash the commits, it is hard to review.

@hzxuzhonghu I don't usually squash my PRs, but this one has been in flight for a very long time and there is always a conflict when I make a commit since it touches many files, having to squash them makes rebasing easier.

for _, h := range se.Hosts {
for _, svcPort := range se.Ports {
// add not ready endpoint
out = append(out, d.notReadyServiceInstance(se, svcPort, conf, h, notReadyIP, notReadyPort))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thisis not right, for example, if 192.168.1.1:8080 and 192.168.1.1:8081 are not ready. What's the output

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a bit more? I have a test case for this case specifically. The expected out put would be

 // NetworkEndpoint(endpoint{Address:192.168.1.1, Port:8080, servicePort: &{http-port 80 http}} service:{Hostname: svc.example2.com})
 // NetworkEndpoint(endpoint{Address:192.168.1.1, Port:8081, servicePort: &{http-port 80 http}} service:{Hostname: svc.example2.com})

is that not correct?

@Nino-K
Copy link
Copy Markdown
Member Author

Nino-K commented Oct 16, 2019

There are other two questions left:

  1. Why notreadyEndpoints Discovery interface, what it is for?
  2. The Istio service is not well populated, at least lack of several important fields, so i think it should cause some regression.

The NotReadyEndpoint discovery handles notReady annotations that are received via galley and it exposes them to EDS so the corresponding updates can be sent out to envoy. This is a case that we were not handling previously in Pilot over MCP.
As for the critical properties in service that are not populated, this feature is turned off by default so there should not be any regression in the normal code path. I will be making follow up PRs to address some of the concerns.

@hzxuzhonghu
Copy link
Copy Markdown
Member

The NotReadyEndpoint discovery handles notReady annotations that are received via galley and it exposes them to EDS so the corresponding updates can be sent out to envoy. This is a case that we were not handling previously in Pilot over MCP.

I could not find its use.

@nmittler
Copy link
Copy Markdown
Contributor

It merged ... Wooh-hoo! :)

Thanks @Nino-K!

@seflerZ
Copy link
Copy Markdown
Contributor

seflerZ commented Oct 23, 2019

Two questions:

  1. Why implemented only on SytheticServiceEntry via Galley? How about other schemas of MCP?
  2. We've got a Nacos registry connected to Pilot via MCP Server but suffering from low performance of updating ServiceEntry. To benifit from incremental updates, is that means we should update data through Galley or we shall implemt incremental ServiceEntry updates on Pilot?

@nmittler
Copy link
Copy Markdown
Contributor

@seflerZ

Why implemented only on SytheticServiceEntry via Galley? How about other schemas of MCP?

what schemas are you referring to?

We've got a Nacos registry connected to Pilot via MCP Server but suffering from low performance of updating ServiceEntry. To benifit from incremental updates, is that means we should update data through Galley or we shall implemt incremental ServiceEntry updates on Pilot?

You'd either have to implement incremental MCP on your server or use Galley.

@seflerZ
Copy link
Copy Markdown
Contributor

seflerZ commented Oct 25, 2019

@nmittler

  1. Some collections such as ServiceEntry, DestinationRule and etc. As far as I know, the MCP is designed to support incremental transfer for all collections listed in Istio documentation
  2. Yes, we can implement incremental MCP for our server. However the Pilot only supports recieve incremental MCP data from Galley as the PR describes and I've checked the code, the SytheticServiceEntry controller supports collection named 'SyntheticServiceEntry' only.

@nmittler
Copy link
Copy Markdown
Contributor

@seflerZ right, Pilot is only currently using incremental for the synthetic service entries. That could change, but may be obviated by the move to istiod.

@nkorange
Copy link
Copy Markdown

Is this feature released? I don't see its introduction in recent releases.

@nkorange
Copy link
Copy Markdown

I have read the code of this PR. I found in this incremental push of synthetic service entries mode Pilot replaces all endpoints of a service, not just adds/removes endpoints in the service.

Say Pilot has data: {{svc1:2.2.2.2}, {svc2:3.3.3.3}}. Then if data {collection='synthetic service entries', {svc1: 1.1.1.1}, incremental=true} is sent to Pilot, the data in pilot change to {{svc1:1.1.1.1}, {svc2:3.3.3.3}}.

Am I understanding it right?

@nmittler
Copy link
Copy Markdown
Contributor

@nkorange this will be in the 1.4 release.

Say Pilot has data: {{svc1:2.2.2.2}, {svc2:3.3.3.3}}. Then if data {collection='synthetic service entries', {svc1: 1.1.1.1}, incremental=true} is sent to Pilot, the data in pilot change to {{svc1:1.1.1.1}, {svc2:3.3.3.3}}.

That wouldn't be the desired result, although I'm not sure our test currently verifies how pilot handles endpoints from different service registries for the same service.

@Nino-K have you tested that by any chance? We're going to want to make sure that works properly.

@nkorange
Copy link
Copy Markdown

nkorange commented Oct 30, 2019

@nmittler Does 1.4.0-beta contain this feature?

@Nino-K
Copy link
Copy Markdown
Member Author

Nino-K commented Oct 30, 2019

@nkorange yes, I believe this feature will be available in 1.4, however it is turned off by default.

@Nino-K
Copy link
Copy Markdown
Member Author

Nino-K commented Oct 30, 2019

@nmittler the implementation of incremental MCP in syntheticServicController currently only focuses on the removal of the configs that received while incremental flag is set and sending updates incrementally to envoy (EDSUpdate vs ConfigUpdate). The reason I have not invested so much time in incremental updates from the source server is currently galley never forwards incremental payload to Pilot, although the snapshots are configured to support incremental which prevented me from testing this in the integration test. I will go ahead and create an issue to investigate this further.

@nkorange
Copy link
Copy Markdown

nkorange commented Oct 31, 2019

I have read the code of this PR. I found in this incremental push of synthetic service entries mode Pilot replaces all endpoints of a service, not just adds/removes endpoints in the service.

Say Pilot has data: {{svc1:2.2.2.2}, {svc2:3.3.3.3}}. Then if data {collection='synthetic service entries', {svc1: 1.1.1.1}, incremental=true} is sent to Pilot, the data in pilot change to {{svc1:1.1.1.1}, {svc2:3.3.3.3}}.

Am I understanding it right?

@Nino-K So this is expected? And Can I download the 1.4.0-beta.1 to test this incremental feature? The master branch seems not stable.

@Nino-K
Copy link
Copy Markdown
Member Author

Nino-K commented Oct 31, 2019

I have read the code of this PR. I found in this incremental push of synthetic service entries mode Pilot replaces all endpoints of a service, not just adds/removes endpoints in the service.
Say Pilot has data: {{svc1:2.2.2.2}, {svc2:3.3.3.3}}. Then if data {collection='synthetic service entries', {svc1: 1.1.1.1}, incremental=true} is sent to Pilot, the data in pilot change to {{svc1:1.1.1.1}, {svc2:3.3.3.3}}.
Am I understanding it right?

@Nino-K So this is expected? And Can I download the 1.4.0-beta.1 to test this incremental feature? The master branch seems not stable.

@nkorange the behaviour that you are expecting is not part of incremental MCP specification. Currently incremental MCP operates at the resource level and not the sub-resources.
e.g in non incremental all MCP resource in a given collection are delivered in a single update. However in the incremental case only MCP resources that were added/changed (recently) are included in the update. Either way in both cases the resources are delivered in full state. Hope that helps.

@nkorange
Copy link
Copy Markdown

nkorange commented Nov 1, 2019

I have read the code of this PR. I found in this incremental push of synthetic service entries mode Pilot replaces all endpoints of a service, not just adds/removes endpoints in the service.
Say Pilot has data: {{svc1:2.2.2.2}, {svc2:3.3.3.3}}. Then if data {collection='synthetic service entries', {svc1: 1.1.1.1}, incremental=true} is sent to Pilot, the data in pilot change to {{svc1:1.1.1.1}, {svc2:3.3.3.3}}.
Am I understanding it right?

@Nino-K So this is expected? And Can I download the 1.4.0-beta.1 to test this incremental feature? The master branch seems not stable.

@nkorange the behaviour that you are expecting is not part of incremental MCP specification. Currently incremental MCP operates at the resource level and not the sub-resources.
e.g in non incremental all MCP resource in a given collection are delivered in a single update. However in the incremental case only MCP resources that were added/changed (recently) are included in the update. Either way in both cases the resources are delivered in full state. Hope that helps.

OK. I see, thanks.

@seflerZ
Copy link
Copy Markdown
Contributor

seflerZ commented Nov 1, 2019

@Nino-K I don't think it's much practical if it doesn't support incremental on sub resources and may lead to misusing because the there is not comment or flag to tell that.

How do you think about implementing the incremental updating on sub-resources? We have very large scale clusters and endpoints. We appreciate incremental updating on every resource and sub resources. We can join in and contribute.

@Nino-K
Copy link
Copy Markdown
Member Author

Nino-K commented Nov 1, 2019

@Nino-K I don't think it's much practical if it doesn't support incremental on sub resources and may lead to misusing because the there is not comment or flag to tell that.

How do you think about implementing the incremental updating on sub-resources? We have very large scale clusters and endpoints. We appreciate incremental updating on every resource and sub resources. We can join in and contribute.

@seflerZ for large clusters with lots of endpoints you are better off to send the full state of the resource as oppose to just updated sub-resources (delta). Here is why incremental updates with full state might be more efficient: 1) pilot would ultimately send the full resource and not only updated part to envoy via EDSUpdate, so what benefit is there to just send an incremental update with only changes in the sub-resource? 2) having pilot to apply the delta on the sub-resources and computing the diff on the resources might be more expensive (in terms of operation) on a large scale cluster with lots of endpoints, I think it would be better to defer that to the config source server to compile the new resource with updated sub-resource (something like what galley does today).

Having said all that, it might be worth to bring this up in the community meeting to see what others think, also I think @ayj might have more insights on this too.

@ayj
Copy link
Copy Markdown
Contributor

ayj commented Nov 1, 2019

ServiceEntry is roughly analogous with the k8s endpoint API. Both contain a list of endpoints for a specific host/service. k8s introduced EndpointSlice as more scalable and extensible alternative. We might consider something similar if/when performance becomes a problem. Taken to the extreme one could imagine an API which represents individual endpoint/hosts per resource.

@howardjohn
Copy link
Copy Markdown
Member

We will be adding support for EndpointSlice as well

Anyways, can't you shard your ServiceEntries already? If you have multiple STATIC service entries they will be merged

@seflerZ
Copy link
Copy Markdown
Contributor

seflerZ commented Nov 4, 2019

@Nino-K
I don't agree with you on incremental updates with full state might be more efficient.

For point 1: Pilot would ultimately send the full resource through EDSUpdate()
The efficiency of EDSUpdate is a problem of implementation not architecture. We generally have a application comsuing 150+ services and every service might contain 1000+ endpoints (The number is large because we don't use a load balancer on service endpoints and have direct connections), so even tricky network turbulence would cause tremendous pushing data at present. So this is the point where Pilot should change, I think, to 'EdsIncremental' or something which introduce incremental update to Envoy. Of cause, Envoy will also change to support that.

For point 2: Having pilot to apply/compute the delta might be expensive
So this is a trade off. Either pushing tremendous data thrasing the network or computing to reduce the traffic. I think computing is better becasue it only consumes resources on Pilot itself, not every Envoy. As we know, with full state update, every Envoy receives and rebuilds all data in memory which is also expensive. It is consuming the resources of application which is very limited. Moreover, full state updates on Pilot causes high memory usage.

BTW: Should I open a new issue to discuss the incremental updates on sub-resources?

@ayj
I've checked the code of Pilot on release-1.4 and noticed that 'edsIncremental' has been introduced. Is that means we're ready to incrementally update EDS to Envoy of version 1.12?

@howardjohn
Is the idea 'EndpointSlice' also supported by MCP? I don't understand 'multiple STATIC service entries' you've said. Our architecture is like this:

We've a triditional microservice registry named Nacos which holds 10 GiB of services and endpoints. Usally a service may have 1000+ endpoints and 100+ service dependencies. Now we want to move to and join in Istio ecosystem but most of our services are not K8s ready yet. So we sync our services and endpoints to Pilot through MCP or Pilot registry plugin. Currently we're suffering from low performance on both Nacos-Pilot synchronization and Pilot-Envoy data pushing.

@ayj
Copy link
Copy Markdown
Contributor

ayj commented Nov 5, 2019

I'm not familiar with Pilot's incremental EDS support. @hzxuzhonghu may know more.

EndpointSlice should be compatible with MCP though its not plumbed through yet. The config server (e.g. galley, Nacos) is responsible for managing the grouping of endpoints over time.

@Nino-K
Copy link
Copy Markdown
Member Author

Nino-K commented Nov 6, 2019

BTW: Should I open a new issue to discuss the incremental updates on sub-resources?

@seflerZ I think it would make more sense to create an issue to discuss this further, thanks!

@hzxuzhonghu
Copy link
Copy Markdown
Member

Pilot's incremental EDS now supports when one endpoints update, we only build and push eds associated with the updated service. Can not build and send the new added/deleted subset of the endpoints, because envoy hasn't support EDS patch yet.

@seflerZ
Copy link
Copy Markdown
Contributor

seflerZ commented Nov 7, 2019

Thanks, everyone. I've opened a proposal to discuss this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.