use EndpointBuilder in CDS#45733
Conversation
|
Next, I'd like to consolidate on shards vs service instances but that has implications for cds cache. |
howardjohn
left a comment
There was a problem hiding this comment.
didn't read the code yet, but initial reaction: there is a lot of effort to handle the special case of EDS that it doesn't use snapshoting mechanism (push context) and allows state to change at any time. Spreading that property made be tricky
|
/test benchmark_istio |
There was a problem hiding this comment.
generally agree that removing this code duplication is probably worth it, unless spreading the mutability of EDS that @howardjohn was talking about is both too difficult and completely necessary. because off the cuff id rather remove the mutability property for EDS and use snapshotting or something else since that's easier to reason about.
why is EDS handled differently? performance reasons? EDS being more dynamic?
39c9e36 to
db333e7
Compare
d71d2d6 to
4360e46
Compare
4360e46 to
5983803
Compare
5983803 to
8b2cb10
Compare
b5a9bba to
cdc8548
Compare
|
I feel a little concerned about such big refactor, since it touches too many files. With CI, it has failed multi times, so need much more look |
cdc8548 to
108a3c4
Compare
FWIW, it wasn't really done yet. The failures from earlier were all due to |
|
/test all |
2a27c06 to
24d1643
Compare
| return locEps | ||
| } | ||
|
|
||
| func (b *EndpointBuilder) filterIstioEndpoint(ep *model.IstioEndpoint, svcPort *model.Port) bool { |
There was a problem hiding this comment.
This needs to be looked at closely. Must cover what we have in the original CDS code.
| } | ||
| h.Write(Separator) | ||
|
|
||
| t.endpointBuilder.WriteHash(h) |
There was a problem hiding this comment.
why do we need to add more factor for cache key calculation when we just do a cleanup
There was a problem hiding this comment.
My concern is mostly that EndpointBuilder takes a model.Proxy and PushContext. It can decide to use whatever parts of that it needs. The cache key for EndpointBuilder will be updated to include those, but now that affects CDS.
I want to make it hard(er) to write caching bugs.
Right now endpoint builder has these things that CDS dosn't:
- isClusterLocal
- node type
- failover priority labels
we just do a cleanup
I'm not sure what you mean.
There was a problem hiding this comment.
My concern is mostly that EndpointBuilder takes a model.Proxy and PushContext. It can decide to use whatever parts of that it needs. The cache key for EndpointBuilder will be updated to include those, but now that affects CDS.
I want to make it hard(er) to write caching bugs.
Right now endpoint builder has these things that CDS dosn't:
- isClusterLocal
- node type
- failover priority labels
we just do a cleanup
I'm not sure what you mean.
| b.subsetName = strings.TrimPrefix(b.subsetName, "http/") | ||
| b.subsetName = strings.TrimPrefix(b.subsetName, "tcp/") | ||
| } | ||
| b.mtlsChecker = newMtlsChecker(b.push, b.port, b.destinationRule.GetRule(), b.subsetName) |
There was a problem hiding this comment.
nit: donot couple mtlsChecker with subsetInfo
There was a problem hiding this comment.
mtlsChecker depends on the subset name. I could make it so we build it in the constructor, and the build it again if the subset changes?
| // If we don't know the address we must eventually use a gateway address | ||
| if ep.Address == "" && ep.Network == b.network { | ||
| return false |
There was a problem hiding this comment.
This was already in endpoint_builder
| if noCache { | ||
| ep.ComputeEnvoyEndpoint(eep) |
There was a problem hiding this comment.
confused why set it for cds endpoint, previously it is for eds
There was a problem hiding this comment.
Should be !noCache.
| } | ||
|
|
||
| // Apply the Split Horizon EDS filter, if applicable. | ||
| locEps = b.EndpointsByNetworkFilter(locEps) |
There was a problem hiding this comment.
Suggest not apply to cds endpoint, just keep as before. Not quite sure whether we set the network label correct
There was a problem hiding this comment.
I'm not sure what you mean. The original motivation (for me) was to get multi-network working for non EDS. It's possible I have a ServiceEntry that has mixed IPs/Hostnames so it must be a DNS cluster. I want the IPs to get mapped.
There was a problem hiding this comment.
I mean for cds, we donot apply filter now. This pr is already big, if we want to apply network filter for cds endpoint, we can do it later
| if shardKey.Cluster != b.clusterID { | ||
| // If the downstream service is configured as cluster-local, only include endpoints that | ||
| // reside in the same cluster. | ||
| if isClusterLocal || b.service.Attributes.NodeLocal { |
There was a problem hiding this comment.
duplicate with the filter later
There was a problem hiding this comment.
This is a very slight optimization allowing us to not loop over endpoints[] because every one of them will be a miss.
| metadataCerts: cb.metadataCerts, | ||
| peerAuthVersion: cb.req.Push.AuthnPolicies.GetVersion(), | ||
| serviceAccounts: cb.req.Push.ServiceAccounts(service.Hostname, service.Attributes.Namespace, port.Port), | ||
| endpointBuilder: endpoints.NewCDSEndpointBuilder( |
There was a problem hiding this comment.
I think some of the slowdown might be that we do this unconditionally. None of our benchmarks are DNS resolution. This patch should have almost no impact.
howardjohn
left a comment
There was a problem hiding this comment.
General idea is fine but need to fix the performance
|
|
||
| // We have a cache miss, so we will re-generate the cluster and later store it in the cache. | ||
| lbEndpoints := cb.buildLocalityLbEndpoints(clusterKey.proxyView, service, port.Port, nil) | ||
| lbEndpoints := clusterKey.endpointBuilder.FromServiceEndpoints() |
There was a problem hiding this comment.
| lbEndpoints := clusterKey.endpointBuilder.FromServiceEndpoints() | |
| var lbEndpoints []*endpoint.LocalityLbEndpoints | |
| // We have a cache miss, so we will re-generate the cluster and later store it in the cache. | |
| if (service.Resolution == model.DNSLB || service.Resolution == model.DNSRoundRobinLB) { | |
| lbEndpoints = clusterKey.endpointBuilder.FromServiceEndpoints() | |
| } |
I think this fixes the performance regression, which is due to always computing endpoints even if its an EDS cluster.
There was a problem hiding this comment.
Right.. I did this on the original draft and un-did it. Thanks!
There was a problem hiding this comment.
Still +5% mean. The ones that have a higher regression are on the order of us.
There was a problem hiding this comment.
Turns out it was passing around/copying the EndpointBuilder struct. Made that a pointer and everything looks good.
There was a problem hiding this comment.
Everything is equal now.
Co-authored-by: John Howard <howardjohn@google.com>
|
/retest |
Changes:
buildLocalityLbEndpointsin cluster builder. That cluster builder method is a partial duplication of a lot of the same logic we have for EDS.Benchmarks are no longer affected.
cc @kdorosh @howardjohn