shared control plane multicluster fixes#21912
Conversation
* Rename the remote istiod service and endpoint to `istiod-remote` to avoid conflicts with the real local istiod service. * Use the `istiod-remote.<namespace>.svc` hostname for the sidecar and ingress proxies discoveryAddress. This address needs to match the SAN in istiod's cert. The `istiod-remote` headless service will resolve the hostname to the remote IP address. * Add the `istiod-remote` hostname to istiod's SANs. Also use istiod's namespace to construct the legacy service names instead of hardcoding them to `istio-system`. * Simplify the remote profile by removing redundant and unused values.
|
Keeping this as a draft while I verify the cleaned up changes against the revised docs (PR pending). There is additional cleanup that I intentionally skipped in this PR to minimize the changes when they're cherry-picked into 1.5. |
| apiVersion: v1 | ||
| kind: Endpoints | ||
| metadata: | ||
| name: istio-pilot |
There was a problem hiding this comment.
is istio-pilot endpoint really needed?
There was a problem hiding this comment.
It would only be needed if we need to support multicluster with the pre-istiod deployment model.
There was a problem hiding this comment.
BTW - I think the istiod & istio-pilot endpoints will also be created as the result of pilot.enabled=true. so do we still need to create this again?
There was a problem hiding this comment.
istio-pilot shouldn't be needed in 1.6+. I kept it because it would make backporting to 1.5 a bit easier. If someone was using the legacy charts and separate services in 1.5 they wouldn't be enabling pilot in the remote cluster.
There was a problem hiding this comment.
isn't enabling pilot the only way to enable istiod? That is what the remote profile has today.
There was a problem hiding this comment.
istiod runs in each clsuter and we need the istiod-remote service to avoid conflicting the "real" istiod service in the cluster.
When istiod=false (separate services) pilot doesn't need to run in the remote cluster. The istio-pilot service in the remote cluster points to the pilot in the main cluster. This is how things worked in <= 1.4.
There was a problem hiding this comment.
We probably don't need to support the istiod=false case for multicluster
There was a problem hiding this comment.
I agree for not needing this in master. What about 1.5?
There was a problem hiding this comment.
@ayj I'd recommend sorting out what this should look like for 1.5 in master. Then follow up with another PR after we release 1.5.1.
There was a problem hiding this comment.
Master doesn't have the legacy helm charts. We'll need to manually cherry-pick this PR anyway.
|
@ayj thank you for the PR! Looks reasonable - definitely I run into some of the challenges you fixed in the PR as well. Having it changed to istiod-remote for xDS serving reduces a lot of confusing. |
…icluster-fixes
|
split-horizon is working now. Requests from each cluster are balanced roughly evenly across all clusters. I've tested with HTTP echo tests so far. I still need to run more extensive traffic and load tests. Cluster topology: I'm still seeing ACK errors in the istiod and proxy logs.
It looks like this might be the cause which is similar to #18671 and #20087.
Full logs here |
|
@howardjohn @rshriram @costinm, any ideas on debugging the ACR error with zero lbEndpoints? I added extra logging to see where we might be setting weights to zero (or more likely not setting them at all). I didn't find anything. This is easily reproducible. |
|
Shriram ran into this as well, not sure we found the root cause?
…On Sat, Mar 7, 2020, 12:15 PM Jason Young ***@***.***> wrote:
@howardjohn <https://github.com/howardjohn> @rshriram
<https://github.com/rshriram> @costinm <https://github.com/costinm>, any
ideas on debugging the ACR error with zero lbEndpoints? I added extra
logging to see where we might be setting weights to zero (or more likely
not setting them at all). I didn't find anything. This is easily
reproducible.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21912?email_source=notifications&email_token=AAEYGXNT5CSWSUTXCEU673TRGKTP3A5CNFSM4LCXEYXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOEEN4I#issuecomment-596133617>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEYGXLXRY5QQCGJEEXYTQ3RGKTP3ANCNFSM4LCXEYXA>
.
|
|
That |
|
The LB weights were always greater than zero immediately prior to sending. The issue didn't occur the endpoints proto cloned before calling stream.Send(). I instrumented pilot to run in-cluster as unit test with the race detector on. It found a data race in the eds path where the endpoint LB weights are updated/read. Looks like endpoints from the cached endpoint shared are reused across connections / pushes. I'm not sure how this is suppose to work for multiple networks where weights for the same service/endpoints could differ. istio/pilot/pkg/proxy/envoy/v2/eds.go Line 876 in ee52425 Full race backtrace - https://gist.github.com/ayj/0bf0d1b8a349f90aaaeafd81b06863a5 |
Not sure how much work this involved but you can do |
| {{- end }} | ||
| - --controlPlaneAuthPolicy | ||
| - NONE | ||
| - --discoveryAddress |
There was a problem hiding this comment.
This doesn't do anything in master, it is reading from mesh config directly. 1.5 its needed though
There was a problem hiding this comment.
can you remove after this merges?
|
Removed the hold. The bad endpoint LB weight doesn't seem to be introduced by this change, though it does increase the likelihood with multiple networks. |
Didn't know that existed - thanks. |
| apiVersion: v1 | ||
| kind: Endpoints | ||
| metadata: | ||
| name: istio-pilot |
There was a problem hiding this comment.
We probably don't need to support the istiod=false case for multicluster
| {{- $defPilotHostname := printf "istiod.%s.svc" .Release.Namespace }} | ||
| {{- $defPilotHostname := printf "istiod.%s.svc" .Release.Namespace }} | ||
| {{- end }} | ||
| {{- $defPilotHostname := printf "istiod%s.%s.svc" .Values.revision .Release.Namespace }} |
There was a problem hiding this comment.
defining variables like this doesn't actually work inside if statements. this is fixed on 1.5 but somehow broken on master, I have a fix in https://github.com/istio/istio/pull/21788/files
There was a problem hiding this comment.
Yup. Looks like your PR and my change below are similar except for the user of the DNS name instead of IP address.
operator/data/profiles/default.yaml
Outdated
| name: tcp-pilot-grpc-tls | ||
| - port: 15012 | ||
| targetPort: 15012 | ||
| name: http-istiod |
There was a problem hiding this comment.
should this be grpc-istiod? Or tcp
There was a problem hiding this comment.
I assume tcp, it is used everywhere for port 15012
|
any idea why this is closed @howardjohn @ayj ? |
|
@ayj I pulled this PR locally and tested with master. I'm getting this error during install. This relates to a point I tried to make earlier that we are creating istio-pilot service as part of istiod enabled in remote cluster which will create the istio-pilot endpoint using the selector indirectly, so when we ask to specifically to create the istio-pilot endpoint it will fail. |
…icluster-fixes
howardjohn
left a comment
There was a problem hiding this comment.
Everything lgtm except the minor error comment
| {{- end }} | ||
| - --controlPlaneAuthPolicy | ||
| - NONE | ||
| - --discoveryAddress |
There was a problem hiding this comment.
can you remove after this merges?
| out, err = c.getProxyServiceInstancesFromMetadata(proxy) | ||
| if err != nil { | ||
| log.Warnf("getProxyServiceInstancesFromMetadata for %v failed: %v", proxy.ID, err) | ||
| err = fmt.Errorf("getProxyServiceInstancesFromMetadata for %v (ip=%v) failed: %v", proxy.ID, proxyIP, err) |
There was a problem hiding this comment.
nit: I think this should be a warning. Its not an internal error, its just sometimes not possible from the metadata provided by the client
There was a problem hiding this comment.
Actually I didn't notice you aren't logging this. We should not return an error here, I think. Was there a reason you think we should?
There was a problem hiding this comment.
I went back and forth on changing this. I was seeing a lot of these messages in the logs. It didn't seem like an error/warning when using multiple control planes. The check on line 541 will fail because Cluster A's registry won't have any pods for Cluster B. And getProxyServiceInstancesFromMetadata will fail because the remote proxy's cluster_id doesn't match the cluster's ID. It should still be printed but as a debug statement by the aggregate.
Some other alternatives:
- modify getProxyServiceInstancesFromMetadata to not return an error when cluster ID's didn't match.
- modify the aggregate controller to only call GetProxyServiceInstances() on the controller whose cluster_id matches the target proxy.
There was a problem hiding this comment.
I am fine with making changes so this logs nothing in multicluster case - that makes complete sense. but if we return an error here, won't the aggregate controller bubble that error up and ultimately we will return an error? So it will always be an error for MC case? I am surprised this works at all, am I missing some step?
There was a problem hiding this comment.
AFAICT the aggregate controller will log the error but not bubble it up if another controller returned one or more instances (see
istio/pilot/pkg/serviceregistry/aggregate/controller.go
Lines 266 to 270 in f8464b5
How about the following? cc @nmittler
diff --git a/pilot/pkg/serviceregistry/aggregate/controller.go b/pilot/pkg/serviceregistry/aggregate/controller.go
index 1741a762e..6964364a4 100644
--- a/pilot/pkg/serviceregistry/aggregate/controller.go
+++ b/pilot/pkg/serviceregistry/aggregate/controller.go
@@ -228,6 +228,9 @@ func (c *Controller) GetProxyServiceInstances(node *model.Proxy) ([]*model.Servi
// It doesn't make sense for a single proxy to be found in more than one registry.
// TODO: if otherwise, warning or else what to do about it.
for _, r := range c.GetRegistries() {
+ if r.Cluster() != "" && node.Metadata != nil && node.Metadata.ClusterID != "" && node.Metadata.ClusterID != r.Cluster() {
+ continue
+ }
instances, err := r.GetProxyServiceInstances(node)
if err != nil {
errs = multierror.Append(errs, err)There was a problem hiding this comment.
Ah thanks... I looked right at that code and missed it. I think that makes sense, probably means things up a bit in general
There was a problem hiding this comment.
Updated to filter in the aggregate controller. We need to special case the default cluster which is always name Kubernetes. I started to clean things up so the default registry uses the provided ClusterName but decided defer that for another PR. This is large enough as-is.
| node.Metadata.ClusterID != r.Cluster() { | ||
| log.Debugf("GetProxyServiceInstances(): not checking cluster %v: proxy %v is associated with cluster %v", | ||
| r.Cluster(), node.ID, node.Metadata.ClusterID) | ||
| continue |
There was a problem hiding this comment.
So if r.Cluster == Kubernetes and node.Cluster == MyOtherCluster, this will NOT be skipped? Isn't that what we were trying to avoid?
There was a problem hiding this comment.
Yes, that is one of the cases we were trying to avoid. ClusterID is overloaded to mean both the registry type (k8s, consul) and the name (which cluster). I didn't want to fix that issue holistically in this PR (see / #22093). On further thought though we could filter 100% of the cases if we had some special handling above that is less intrusive (see skipSearchingRegistryForProxy above).
| errs = multierror.Append(errs, err) | ||
| } else if len(instances) > 0 { | ||
| out = append(out, instances...) | ||
| out = instances |
There was a problem hiding this comment.
Why this change? This means we can only get endpoints from one registry? Which will break all the other registries like ServiceEntry?
There was a problem hiding this comment.
See the for loop comment - "It doesn't make sense for a single proxy to be found in more than one registry". out only ever has at most instances from one registry. The code breaks out of the loop two lines below on line 251.
There was a problem hiding this comment.
I don't think that comment is accurate? I am surprised all tests pass, I think you should be able to have ServiceEntry + Kube.
There was a problem hiding this comment.
I've removed the change from this PR. We can address it in a follow-up.
|
@ayj any plan to back port this to 1.5? |
|
Yes. PR is in progress. I'm testing the upgrade/downgrade cases. 1.4<=>1.5 with istiodctl and istiod and non-istiod variants are passing. Checking the legacy helm chart case and then I'll send out the PR for review. |
Rename the remote istiod service and endpoint to
istiod-remotetoavoid conflicts with the real local istiod service.
Use the
istiod-remote.<namespace>.svchostname for the sidecar andingress proxies discoveryAddress. This address needs to match the
SAN in istiod's cert. The
istiod-remoteheadless service willresolve the hostname to the remote IP address.
Add the
istiod-remotehostname to istiod's SANs. Also use istiod'snamespace to construct the legacy service names instead of
hardcoding them to
istio-system.Simplify the remote profile by removing redundant and unused values.
Usage: