Skip to content

Fixes #11818. Select Sidecars by Workload/Pod labels instead of Servi…#12620

Closed
robertpanzer wants to merge 3 commits intoistio:release-1.1from
robertpanzer:fix-11818-sidecar-selector
Closed

Fixes #11818. Select Sidecars by Workload/Pod labels instead of Servi…#12620
robertpanzer wants to merge 3 commits intoistio:release-1.1from
robertpanzer:fix-11818-sidecar-selector

Conversation

@robertpanzer
Copy link
Copy Markdown
Contributor

…ce labels

This PR is an attempt to fix #11818.

I don't know if I missed some parts, but however I am trying:)

The problem reported in the issue is that the workload selectors of the Sidecar instances match against the labels of the Services which in turn lead to the proxies, but that doesn't work for consumer only services.

This PR changes this by getting the workload labels from the DiscoveryServer for the current proxy and uses these labels as the basis for the selection of the Sidecar instance.

There's just a few open questions:

  • The member holding the workload labels is called WorkloadsByID, suggesting that its keys would be the IDs of the proxies. Instead the only place I could find where this map is filled is the kube serviceregistry which feeds the pod ip as the id:
    pc.c.XDSUpdater.WorkloadUpdate(ip, pod.ObjectMeta.Labels, pod.ObjectMeta.Annotations)
  • I didn't find any specific tests that would cover Sidecars right now, and most of what I ran locally succeeded despite the change. Running the tests locally isn't easy for me, so let's see what the CI says. Local manual testing worked for me and also picked up dynamic changes to Sidecars and Pod labels (I know that doesn't count much without automated tests :P)
  • This is a breaking change for existing users, is it still possible to fix this issue on 1.1.x assuming that Istio somehow follows semver?

@googlebot
Copy link
Copy Markdown
Collaborator

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no Set by the Google CLA bot to indicate the author of a PR has not signed the Google CLA. label Mar 20, 2019
@istio-testing
Copy link
Copy Markdown
Collaborator

Hi @robertpanzer. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is not goimports-ed (from goimports)

@robertpanzer robertpanzer force-pushed the fix-11818-sidecar-selector branch from 76edc92 to 63374d5 Compare March 20, 2019 10:36
@robertpanzer robertpanzer force-pushed the fix-11818-sidecar-selector branch from 63374d5 to 7187d8f Compare March 20, 2019 10:39
@googlebot
Copy link
Copy Markdown
Collaborator

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes and removed cla: no Set by the Google CLA bot to indicate the author of a PR has not signed the Google CLA. labels Mar 20, 2019
@robertpanzer robertpanzer force-pushed the fix-11818-sidecar-selector branch from 8eb7a1a to 7187d8f Compare March 20, 2019 11:29
@robertpanzer
Copy link
Copy Markdown
Contributor Author

I also checked using Proxy.Metadata as the source for the workload labels.
Unfortunately these don't seem to be dynamic and react on label updates made with kubectl.

@hzxuzhonghu
Copy link
Copy Markdown
Member

/ok-to-test

@istio-testing istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. and removed needs-ok-to-test labels Mar 20, 2019
@robertpanzer
Copy link
Copy Markdown
Contributor Author

I wonder though how sidecars that don't run inside a K8s cluster could be supported.
My guess is that such applications have to be registered in another Service Registry like Consul.
However the Consul Registry doesn't seem to feed the WorkloadsById.
Maybe that needs to be added as well.

@howardjohn
Copy link
Copy Markdown
Member

Thanks for picking this up, looks pretty good. I believe the failing test are known flakes.

I'll pull this down and test it out a bit as well


func (s *DiscoveryServer) workloadLabels(proxy *model.Proxy) model.Labels {
var workloadLabels model.Labels
if workload := s.WorkloadsByID[proxy.IPAddresses[0]]; workload != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getting IPAdresses[0] is safe but not 100% sure if someone could verify

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.
I found code that also simply accesses Proxy.IPAddresses[0], but now found that this in the kube registry, where it makes sense to make this assumption.
I am going to change this, even if this removes the precious lgtm label 😢

@howardjohn
Copy link
Copy Markdown
Member

/lgtm

I think we can get this in for 1.1.x but up to @duderino / others

func (s *DiscoveryServer) workloadLabels(proxy *model.Proxy) model.Labels {
var workloadLabels model.Labels
ipAddresses := proxy.IPAddresses
if len(ipAddresses) == 0 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts were more if we have multiple ipAddress rather than none - I haven't had a chance to dig into how the addresses get set though so I am not sure what happens outside of k8s. @rshriram any ideas? I don't think its a major issue but I am only familiar with kubernetes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
Currently afaict only the kube registry feeds workloadLabels with the podIP.
However there could probably be more than 1 address in Proxy.IPAdresses and assuming that this PodIP is always in the first slot might be brittle.
Maybe it would make sense then to make workloadLabels return a LabelsCollection and require the workloadSelector to be a superset of every item in that collection.

What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think looking at all of them makes sense, the question I would have is do we need the workloadSelector to match all or any of the labels we get. I suspect we would want to do all. I think this is basically how it worked before right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's how it worked before, just using the services labels.
I think it also makes sense to match against all Labels, with the current code I would assume that even with different IPAddresses only one set should be non-empty.
Thanks for the discussion! 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just upgraded the PR to get the labels associated with all IP addresses.
The selector will match if it is a subset of any set of labels associated with a workloadId.
This is basically the same behavior as before, just that it tests against the labels of the Pods instead of the Services.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zhaohuabing any thoughts since you added the multi-ip support for each proxy, for consul environments

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started working on an alternative fix today that would add a method ServiceDiscovery.GetProxyWorkloadLabels() which returns the workload labels for a proxy.
Maybe this works better for proxies not running in Kubernetes.
The change set will certainly be much, much larger.

@howardjohn
Copy link
Copy Markdown
Member

/lgtm thanks for working on this!

@istio-testing
Copy link
Copy Markdown
Collaborator

@robertpanzer: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
prow/istio-integ-k8s-tests.sh 416b7fd link /test istio-integ-k8s-tests
prow/istio-pilot-multicluster-e2e.sh 416b7fd link /test istio-pilot-multicluster-e2e
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hzxuzhonghu
Copy link
Copy Markdown
Member

/lgtm

@rshriram WDYT

@hzxuzhonghu
Copy link
Copy Markdown
Member

Sorry after another thought, this is not right.

/lgtm cancel

@istio-testing
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: howardjohn, robertpanzer
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: hzxuzhonghu

If they are not already assigned, you can assign the PR to them by writing /assign @hzxuzhonghu in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ipAddresses := proxy.IPAddresses
labels := make([]model.Labels, 0, len(ipAddresses))
for _, ip := range ipAddresses {
if workload := s.WorkloadsByID[ip]; workload != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkloadsByID only set in k8s env

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was one of my questions: How would proxies identify for non-K8s applications?
Is it required to register MESH_INTERNAL ServiceEntries for these?
Or use Consul?

Proxy.Metadata and the current approach using Proxy.ServiceInstances don't work, as the metadata is fix and doesn't follow Pod labels, and ServiceInstances doesn't exist for consumer only applications.
Consumer only non-K8s applications should suffer from the same problem.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkloadsByID is only used by incremental EDS feature.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which only supported in k8s

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that.
Maybe it would be possible to store the labels in another location at the DiscoveryServer where other Registries can store and update the labels too.
That shouldn't be too hard actually and could probably also use the IPAddress as key, the Consul Registry already uses this for matching a Proxy to a ServiceInstance.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another approach could be to add a method GetProxyWorkloadLabels to ServiceRegistry.
On Kubernetes this could still get dynamic updates.
I can have a try for this approach today.

@rshriram
Copy link
Copy Markdown
Member

I am reluctant to do a partial fix like this.. while you might be able to get the Sidecar logic flowing, other plugins might fail if they encounter a workload that has no service instances (notably authN and such). A simple litmus test is to try to add a test under lds_test.go (there are tests for sidecar, follow the same logic, but create a pseudo workload without any service). If my hunch is right, you will find that a couple of plugins will fail and seg fault.

Alternative fix is to create a dummy internal service for workloads that do not have a backing service, in the getProxyServiceInstances call. that might solve your problem. Once again, I would validate with the tests in lds_test.go at the very least to see what else might fail.

@robertpanzer
Copy link
Copy Markdown
Contributor Author

Thanks for the pointer where to add tests, I was already searching for that!
I only wonder why these segfaults should occur there given that my manual tests with this PR already worked for a K8s Deployment without a K8s Service.
Actually, wouldn't that mean that even right now with 1.1.0 such applications should make Pilot trap?
I don't really see the difference in taking the reference labels just from another location.

I see the point though that it's problematic for proxies corresponding to services from other ServiceRegistries, e.g. Consul, that they won't be able to find any workload labels anymore.
This is additionally made worse by the fact that all services from Consul result in ServiceInstances belonging to the default namespace, so that it's not even possible to apply different Sidecars by namespace.

Adding a fake ServiceEntry doesn't sound like a good idea to me.
There's probably a lot of places where these have to be sorted out again like the health checks that were mentioned above.

@robertpanzer
Copy link
Copy Markdown
Contributor Author

Created an alternative PR to fix the problem at #12666.
Maybe this finds more acceptance.

@robertpanzer
Copy link
Copy Markdown
Contributor Author

Closing this in favor of #12666

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants