Skip to content

Apply peer authentication policy#20829

Merged
istio-testing merged 19 commits intoistio:masterfrom
diemtvu:mtls-beta-p3
Feb 7, 2020
Merged

Apply peer authentication policy#20829
istio-testing merged 19 commits intoistio:masterfrom
diemtvu:mtls-beta-p3

Conversation

@diemtvu
Copy link
Copy Markdown
Contributor

@diemtvu diemtvu commented Feb 4, 2020

  • Implement logic to select peer authentication policy for workload: most specific scope, then most secure mode (i.e STRICT > PERMISSIVE > DISABLE).

  • Implement logic to apply peer authentication policy at workload level and above (namespace, mesh), and fallback to (alpha) authentication policy.

  • Port-level support and better auto mTLS will be taken care in the following up PRs.

Issue: #20746

@diemtvu diemtvu requested a review from incfly February 4, 2020 00:58
@diemtvu diemtvu requested a review from a team as a code owner February 4, 2020 00:58
@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Feb 4, 2020
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Feb 4, 2020
@istio-testing istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 4, 2020
Copy link
Copy Markdown

@incfly incfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have two high level questions.

what's the rough idea of supporting auto mTLS? wire up the policy resolution with EDS code? every policy change is a full push to ensure the correctness?

when no port mtls is specified, what's the behavior, secure everything including pass through listeners? when port specified, only per port listener is secured? what if the port defined in the policy is not part of the existing workload's inbound listener, aka, :? is that considered as mis-configuration?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a valid use case? label selector && defined in root ns? just want to confirm.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any prior usage of this pattern? req authn || authz?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first question, yes (for now), as we don't have the way to prevent selector used in root namespace yet (need some change in validation webhook to know what is the root namespace, and it may not be useful given root namespace can be changed).

For the second, both request authn and authz are appendable, so they don't need this logic to disambiguation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a valid use case. WE dont respect workload specific config in the root namespace for sidecar or envoy filter. So lets follow the pattern for consistency and not create exceptions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we can ignore them? (but not reject at validation time)

@diemtvu diemtvu requested a review from a team as a code owner February 4, 2020 20:01
@diemtvu
Copy link
Copy Markdown
Contributor Author

diemtvu commented Feb 4, 2020

i have two high level questions.

what's the rough idea of supporting auto mTLS? wire up the policy resolution with EDS code? every policy change is a full push to ensure the correctness?

Right. May be we can change BuildLbEndpointMetadata to take into account the policy for that given workload.

when no port mtls is specified, what's the behavior, secure everything including pass through listeners? when port specified, only per port listener is secured? what if the port defined in the policy is not part of the existing workload's inbound listener, aka, :? is that considered as mis-configuration?

This is 3 questions :), so yes, yes (assuming the parent level is set to disable) and depends.

For the last one, I think we can either ignore it (base on assumption traffic to that port will get 503 anyway), or still create an inbound listener for that port with those setting. Not sure which one is the right answer yet, though I'm a bit leaning forward the later.

@diemtvu diemtvu changed the title [WIP] Apply peer authentication policy Apply peer authentication policy Feb 4, 2020
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Feb 4, 2020
@diemtvu
Copy link
Copy Markdown
Contributor Author

diemtvu commented Feb 5, 2020

/test integ-security-k8s-tests_istio
/test integ-distroless-k8s-tests_istio

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer abstracting to a fuction

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets please not use the old logic. This is not only complex but also error prone. All we need is a simple function in the push context PeerAuthenticationPolicyForNamespace(ns). This will return either the global one or the namespace local one. If there is no global one, we assume permissive, just like the way we assume a default sidecar. We can reuse logic similar to the sidecar/envoy filter etc.

And we should NOT check the workload specific authN policy - the cluster can belong to a subset or to an entire service. As we agreed previously, if the user is sophisticated enough to use a per workload policy to disable peer authn, then she should also define destination rules, etc. to create the exception. This model will keep the implementation simple and robust, and most importantly scalable! We cannot be doing label based search for every possible service, for every sidecar in the system.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I implemented it this way so that we have something in case we don't have time to work on the better option. Anyway, I revert this now, and we will have a separate PR tackle the auto-mtls alone.

Copy link
Copy Markdown
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the logic in cluster.go. We cannot be querying by service labels and we cannot be computing the appropriate peer authn for every workload. All of this needs to be pre-computed per namespace and we should only be doing lookups. The first fix should be in push context, where you initialize peer authN. Also, we need to have a built in default peer authN if there is no global peer authN. This will ensure that all namespaces will always inherit the global one.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets please not use the old logic. This is not only complex but also error prone. All we need is a simple function in the push context PeerAuthenticationPolicyForNamespace(ns). This will return either the global one or the namespace local one. If there is no global one, we assume permissive, just like the way we assume a default sidecar. We can reuse logic similar to the sidecar/envoy filter etc.

And we should NOT check the workload specific authN policy - the cluster can belong to a subset or to an entire service. As we agreed previously, if the user is sophisticated enough to use a per workload policy to disable peer authn, then she should also define destination rules, etc. to create the exception. This model will keep the implementation simple and robust, and most importantly scalable! We cannot be doing label based search for every possible service, for every sidecar in the system.

return model.MTLSPermissive
}
switch peer.Mtls.Mode {
case v1beta1.PeerAuthentication_MutualTLS_DISABLE:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what about the unset thing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Now it will be more complicated :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually simple if you use the same logic as the Sidecar

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a valid use case. WE dont respect workload specific config in the root namespace for sidecar or envoy filter. So lets follow the pattern for consistency and not create exceptions

level += 2
}
if isStrictlyStronger(spec, configByScope[level]) {
configByScope[level] = cfg
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is memory intensive, allocating an array and destroying it for each query for each proxy in the system. It will also undergo a lot of changes when you introduce workload specific sidecars. To me, a better way is to reuse the logic that builds https://github.com/istio/istio/blob/master/pilot/pkg/model/push_context.go#L95 sidecarsByNamespace . And then as commented above, you simply have to query push context for the namespace's peer authN policy. Otherwise, this will become a performance bottleneck once again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use that. This is a second round to pick from the short list (the input list is a list of config that match the workload) based on our precedent order rules.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You dont need the list of configs - that is my point. Take a look at the code in push context. It simply scans through a hash map and picks the appropriate config without allocating more memory. If there is no namespace level config, it automatically falls back to the global config which always exists even if the user did not define a global setting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are few things that the approach with sidecar doesn't work here:

  • We want to take into account the 'strength' of mTLS mode to break the tie.
  • We need to 'inherit' from parent for UNSET mode (thanks to your other comment :) )

I've updated this PR to handle UNSET, but I'm still working on simplify it a bit more, using the assumption that there is at most one namespace & mesh level policy (we already enforce that from validation), and also ignore policies in root namespace but have selector (as you suggested)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the pointer of array, []*model.Config, not the []model.Config, will that still be mem intensive concern?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-implement this function to take into account UNSET (which should be inherit from parent), and port-level settings (which need to be aggregated to the final policy, including resolve for UNSET). It seems more complicated now, unfortunately, though I think it has the correct behavior.

PTAL

// Review ports with UNSET mode and update the final workload level mTLS if it is stronger.
for port := range unsetPorts {
existing, exist := finalPolicy.PortLevelMtls[port]
if !exist || getMutualTLSMode(finalPolicy.Mtls) > getMutualTLSMode(existing) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. might require changing the signature of isStritlyStrong to pass MutualTls rather than policy

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

},
},
{
ConfigFile: "beta-mtls-on.yaml",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it help to have test cases where we have both beta and alpha policy? that can be a typical case during migration.

and what's the intended behavior? to me, it seems like if there's one beta policy instance (ns/mesh/workload) can be applied to a particular workload, all alpha policy is ignored, is that correct?

feel free to do it in follow-up PR and leave a TODO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already there.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which one? I didn't see it in the e2e test. all config file only references the beta policy.

finalPolicy.PortLevelMtls[port] = mtls
}
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to break out of the main for loop once you get a workload specific policy. Thsi will be the oldest policy. We cannot pick the strongest as it has the potential to break workloads that dont even belong to the team that wrote this policy.

processedJwtRules: processedJwtRules,
alphaApplier: alpha_applier.NewPolicyApplier(policy),
jwtPolicies: jwtPolicies,
peerPolices: peerPolicies,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need peerpolicies? seems to me you only need to store consolidatePeerPolicy in the applier.

@diemtvu
Copy link
Copy Markdown
Contributor Author

diemtvu commented Feb 6, 2020 via email

@diemtvu
Copy link
Copy Markdown
Contributor Author

diemtvu commented Feb 6, 2020 via email

},
want: nil,
},
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a less common case: define port level settings in mesh/namespace policy, and see how that's inherited && combined with workload level policy. shall we add case for that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. It can be rejected by validation.

Copy link
Copy Markdown

@incfly incfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i left one more question on the behavior of port level settings defined in the mesh/ns policy. approving now.

func composePeerAuthentication(rootNamespace string, configs []*model.Config) *v1beta1.PeerAuthentication {
var meshPolicy, namespacePolicy *v1beta1.PeerAuthentication
var meshPolicy, namespacePolicy, workloadPolicy *v1beta1.PeerAuthentication
// Creation time associate with the selected workloadPolicy above. Intiial to max time (not set)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial, typo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if namespacePolicy != nil && !isMtlsModeUnset(namespacePolicy.Mtls) {
// If namespace policy is defined, update output policy to namespace policy. This means namespace
// policy overwrite mesh policy.
outputPolicy.Mtls = namespacePolicy.Mtls
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, are we abandoning inherit stronger from namespace/mesh policy as well? i thought we only remove inherit for workload level policy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's never before. There is at most 1 namespace level, so no conflict. Same as mesh. Narrower scope always win.

@@ -343,111 +344,71 @@ func getMutualTLSMode(mtls *v1beta1.PeerAuthentication_MutualTLS) model.MutualTL
// replaced with config from workload-level, UNSET in workload-level config will be replaced with
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't see these section updates. i expect there' will be some.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is need to be updated? It is just a conversion from user facing enum to our internal implementation so we can maintain the same semantic between different API version.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i mean line 341.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are more than 1 policy define
// port-level mTLS for the same port, the stronger one is used.

this is no longer true. we use the entire workload policy ordered by timestamp now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, done.

Copy link
Copy Markdown

@incfly incfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

istio/api doc shall have some updates as well i assume?

@diemtvu
Copy link
Copy Markdown
Contributor Author

diemtvu commented Feb 6, 2020

istio/api doc shall have some updates as well i assume?

We didn't say much about conflict resolution in the API :).

Copy link
Copy Markdown

@incfly incfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's ship it!

@diemtvu
Copy link
Copy Markdown
Contributor Author

diemtvu commented Feb 7, 2020

/test e2e-mixer-no_auth_istio
/test /test integ-mixer-k8s-tests_istio

@istio-testing istio-testing merged commit 9053f47 into istio:master Feb 7, 2020
@istio-testing
Copy link
Copy Markdown
Collaborator

In response to a cherrypick label: #20829 failed to apply on top of branch "release-1.5":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	pilot/pkg/security/authn/policy_applier.go
M	pilot/pkg/security/authn/v1alpha1/policy_applier.go
M	pilot/pkg/security/authn/v1beta1/policy_applier.go
M	pilot/pkg/security/authn/v1beta1/policy_applier_test.go
Falling back to patching base and 3-way merge...
Auto-merging pilot/pkg/security/authn/v1beta1/policy_applier_test.go
CONFLICT (content): Merge conflict in pilot/pkg/security/authn/v1beta1/policy_applier_test.go
Auto-merging pilot/pkg/security/authn/v1beta1/policy_applier.go
CONFLICT (content): Merge conflict in pilot/pkg/security/authn/v1beta1/policy_applier.go
Auto-merging pilot/pkg/security/authn/v1alpha1/policy_applier.go
CONFLICT (content): Merge conflict in pilot/pkg/security/authn/v1alpha1/policy_applier.go
Auto-merging pilot/pkg/security/authn/policy_applier.go
Patch failed at 0001 Apply beta peer authentication policy down to workload level

diemtvu added a commit to diemtvu/istio that referenced this pull request Feb 8, 2020
* Apply beta peer authentication policy down to workload level

* Clean up

* Lint

* Check beta policy for auto mtls. This can be removed when EP metadata take into account the policy

* Use explicit peerauthentication policy for permissive, as we haven't remove old mesh policy during installation

* pilot/pkg/security/authn/v1beta1/policy_applier.go

* Move all test for beta mTLS api to the end

* Change to namespace policy

* Revert cluster.go

* Change peer authn consolidation algorithm for UNSET (inheritant mode)

* Reimplement getMostSpecificConfig (now composePeerAuthentication) which also consolidate port-level policies.

* Fix inheritance: do not inherit if it is weaker than the current mode

* Remove debug logs

* Change test policy to namespace level to make sure they are clean up properly with the existing test setup.

* Address comment

* Lint

* Simplify logic to pick the oldest

* fix typo

* Update function comment
istio-testing pushed a commit that referenced this pull request Feb 8, 2020
* Apply beta peer authentication policy down to workload level

* Clean up

* Lint

* Check beta policy for auto mtls. This can be removed when EP metadata take into account the policy

* Use explicit peerauthentication policy for permissive, as we haven't remove old mesh policy during installation

* pilot/pkg/security/authn/v1beta1/policy_applier.go

* Move all test for beta mTLS api to the end

* Change to namespace policy

* Revert cluster.go

* Change peer authn consolidation algorithm for UNSET (inheritant mode)

* Reimplement getMostSpecificConfig (now composePeerAuthentication) which also consolidate port-level policies.

* Fix inheritance: do not inherit if it is weaker than the current mode

* Remove debug logs

* Change test policy to namespace level to make sure they are clean up properly with the existing test setup.

* Address comment

* Lint

* Simplify logic to pick the oldest

* fix typo

* Update function comment
@diemtvu diemtvu deleted the mtls-beta-p3 branch February 10, 2020 21:48
sdake pushed a commit to sdake/istio that referenced this pull request Feb 21, 2020
* Apply beta peer authentication policy down to workload level

* Clean up

* Lint

* Check beta policy for auto mtls. This can be removed when EP metadata take into account the policy

* Use explicit peerauthentication policy for permissive, as we haven't remove old mesh policy during installation

* pilot/pkg/security/authn/v1beta1/policy_applier.go

* Move all test for beta mTLS api to the end

* Change to namespace policy

* Revert cluster.go

* Change peer authn consolidation algorithm for UNSET (inheritant mode)

* Reimplement getMostSpecificConfig (now composePeerAuthentication) which also consolidate port-level policies.

* Fix inheritance: do not inherit if it is weaker than the current mode

* Remove debug logs

* Change test policy to namespace level to make sure they are clean up properly with the existing test setup.

* Address comment

* Lint

* Simplify logic to pick the oldest

* fix typo

* Update function comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/security cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants