Fix X509AuthenticationToken principal by albertzaharovits · Pull Request #43932 · elastic/elasticsearch

albertzaharovits · 2019-07-03T16:05:40Z

Fixes a bug in the PKI authentication where, when multiple PKI realms are configured in the chain, one realm might parse the principal from the subject DN (according to the username_pattern realm setting) and another one might pass the truststore validation (according to truststore.* settings).

This is due to the two passes through the realm chain, first to build the authentication token and second to authenticate it, and that the X509AuthenticationToken sets the principal during construction.

This fix shifts parsing the principal from the token construction to the authorization process.

Relates #43796 #34396

elasticmachine · 2019-07-03T16:05:42Z

Pinging @elastic/es-security

bizybot

LGTM, after addressing the comments. Thank you.

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/pki/PkiRealm.java

albertzaharovits · 2019-07-03T21:18:21Z

Thanks a lot @bizybot for the fast review here as well!

tvernum · 2019-07-08T08:24:01Z

I understand the rationale for this, but it risks being a breaking change (audit events rely on the principal from the token) and to my knowledge no one has ever complained about this current behaviour.

The behaviour is certainly wrong (in the sense that are some configurations that won't work even though they should), but did you evaluate the impact against the benefit?

albertzaharovits · 2019-07-08T10:12:03Z

I understand the rationale for this, but it risks being a breaking change (audit events rely on the principal from the token) and to my knowledge no one has ever complained about this current behaviour.

The principal in the audit log is wrong, as it could be parsed by any other installed PKIRealm, and not the one doing the actual authentication. This is a manifestation of the bug. I don't think we should wait for someone to complain about it to fix this.

The behaviour is certainly wrong (in the sense that are some configurations that won't work even though they should), but did you evaluate the impact against the benefit?

When making the code changes I have followed where the X509AuthenticationToken.principal() is referred to in the code, and apart from the audit log and response headers there were no other places. Do you have any suggestion how I could otherwise evaluate the impact of this?

tvernum · 2019-07-09T02:20:15Z

The most common scenario for PKI is a single realm.
The current behaviour in such a case is that when PKI is used, the audit records (and audit filters) for authentication-failed will use the same principal as all other records (authc-success, authz-failed, etc).
The new behaviour is that this will change and for reasons that are totally irrelevant (*) to the es-admin, the authc-failed events will be filtered/logged differently to the others.

(*) Because in their case there is only a single username_pattern, so there is only 1 possible way to turn a cert-DN into a principal.

This bug only manifests if you have 2 PKI realms, with different username_pattern settings, where the higher order realm's pattern does not trust certificates issued for the 2nd realm, yet that first realm will successfully match against the DNs for the 2nd realm, but produce the wrong principal.

That's a scenario that we imply will work (because the settings allow it, and the docs don't call it out as unsupported) but I doubt we've ever had a single person run into it, and I can't imagine why anyone would (or at least if they did, why they couldn't fix it by tweaking their 1st pattern to be more strict/precise).

We know that our token extraction process is flawed, and I see this issue as a manifestation of that. But this feels like we're working around that problem (again) rather than fixing it, while at the same time potentially breaking the audit behaviour for existing installs.

I'm not saying No, I'm just saying Why make this trade-off right now?.
Why risk impacting existing users in order to band-aid over the token extraction problem in a minor release?

tvernum · 2019-07-09T05:57:05Z

The most common scenario for PKI is a single realm.
The current behaviour in such a case is that when PKI is used, the audit records (and audit filters) for authentication-failed will use the same principal as all other records (authc-success, authz-failed, etc).

Actually, I just realised that the BWC issue could be worse than that, although it is unlikely to be a problem in practice.
Currently, if the DN does not match the pattern, then no token (null) is returned. Which means another realm (e.g. Kerberos) might have an opportunity to generate a token object, and then authenticate.

The proposed behaviour would cause the PKI realm to create a token, but then fail to authenticate because the principal cannot be extracted from the DN.

One solution would be to continue to parse the principal in the first matching realm, but then re-parse it during authenticate on each realm. That's truly quite ugly, but it would fix the bug in a BWC way.

We clearly need to fix the AuthenticationToken process for 8.0, because we're frequently running into its limitations, but I'd like us to tread carefully about making potentially breaking changes in 7.x

albertzaharovits · 2019-07-09T08:23:01Z

The current behaviour in such a case is that when PKI is used, the audit records (and audit filters) for authentication-failed will use the same principal as all other records (authc-success, authz-failed, etc).
The new behaviour is that this will change and for reasons that are totally irrelevant () to the es-admin, the authc-failed events will be filtered/logged differently to the others.
() Because in their case there is only a single username_pattern, so there is only 1 possible way to turn a cert-DN into a principal.

The authentication_failed would indeed contain the Subject DN instead of the parsed principal. But, in my opinion, the implications of this are exaggerated, or at least not fairly weighted against the status quo. First of, there are no other audit records to correlate with in case of an authn-failure event, because it doesn't go further into authz, etc. The authn-success principal field is unchanged by this and can be correlated with authz events, as usual. Secondly, we have other realms (Kerberos) where the principal cannot be determined before authentication is successful, so there is precedent to this. The argument, that the es-admin might complain about the unparsed Subject DN in authn-failed audits, even though it only has a single PKI realm installed (hence a single way to parse it), is also weak, because I can also imagine es-admins complaining on the converse, that the audit contains parsed DNs and they have to reconstruct the DN to identify the Subject DNs of the certificates being denied.

This bug only manifests if you have 2 PKI realms, with different username_pattern settings, where the higher order realm's pattern does not trust certificates issued for the 2nd realm, yet that first realm will successfully match against the DNs for the 2nd realm, but produce the wrong principal.
That's a scenario that we imply will work (because the settings allow it, and the docs don't call it out as unsupported) but I doubt we've ever had a single person run into it, and I can't imagine why anyone would (or at least if they did, why they couldn't fix it by tweaking their 1st pattern to be more strict/precise).

There are other scenarios. For example, both realms could have the same trust certificates, but different principal parsing rules, where patterns could be for different OU= parts of the DN, so there is not one more general than the other, for example CN=(.*?), OU=engineering and CN=(.*?), OU=marketing. In this case, no matter the realm order and the certificate's Subject DN, you will always get authenticated by the first PKI realm in the chain. This leads to more than bogus audit entries (wrong realm name), because there could be role-mapping rules for the realm name.
You could also argue that this scenario is unlikely. But my point is that the where the higher order realm's pattern does not trust certificates issued for the 2nd realm, yet that first realm will successfully match against the DNs for the 2nd realm is not a fair conditional for the bug to manifest; it is enough to have two PKIRealms with different principal parsing rules.

We know that our token extraction process is flawed, and I see this issue as a manifestation of that. But this feels like we're working around that problem (again) rather than fixing it, while at the same time potentially breaking the audit behaviour for existing installs.
I'm not saying No, I'm just saying Why make this trade-off right now?.
Why risk impacting existing users in order to band-aid over the token extraction problem in a minor release?

A "proper" fix must go in a major release. But this should not preclude a fix in the 7.x releases. As detailed above, in my opinion, we have to weigh the breaking of the authn-failed audit entries (which are not correlat-able to other events) against the broken state of multiple PKIRealms with different principal parsing rules. As another example, to accentuate the severity of the issue, suppose you have two PKIRealms with different truststores and different CN parsing rules. There are circumstances in which a given certificate that SHOULD NOT be allowed access IS authenticated because of the disconnection of the DN parsing to chain validation (if you wish, add to that role mapping rules by realm name). If someone where to report this, they would be using secure@ for sure. Of course this can all be discounted, by saying The most common scenario for PKI is a single realm. but we need to do something about it nonetheless.

This came up, during the implementation of the PKI delegation, because there the first pass of the authentication process of first building the X509AuthenticationToken is skipped as the token is built by the transport action. The transport action could pick the first realm in the chain and use that parsing rule, but this, in my opinion, is a much uglier band-aid that the proposed fix.

albertzaharovits · 2019-07-09T08:57:55Z

Currently, if the DN does not match the pattern, then no token (null) is returned. Which means another realm (e.g. Kerberos) might have an opportunity to generate a token object, and then authenticate.
The proposed behaviour would cause the PKI realm to create a token, but then fail to authenticate because the principal cannot be extracted from the DN.

Good catch! Though this implies that the client uses multiple authn methods for the same request which is not supported so we should not worry about breaking anything there.

tvernum · 2019-07-09T09:29:23Z

Good catch! Though this implies that the client uses multiple authn methods for the same request which is not supported so we should not worry about breaking anything there.

Why is that not supported?
We support using client certificates alongside other authentication schemes.

albertzaharovits · 2019-07-09T09:58:02Z

Why is that not supported?
We support using client certificates alongside other authentication schemes.

Yes, we support client certificates alongside other authentication schemes. I was talking generally, that we don't support two authentication schemes such that if one fails try the other one, apart from the client certificates, which I assumed that it was a coincidence. Judging from your response, it might not have been a coincidence, but it's certainly not a feature we document.

albertzaharovits · 2019-07-09T17:04:59Z

Me and Tim discussed on another channel about the BWC implications, and we concurred that there was an easy way to avoid unnecessary breaking changes.
I have pushed 6f4f11f . As discussed, this last push achieves the following:

when there is a single PKI realm installed there is no reason to break BWC by auditing authn-failures with the unparsed DN as principal. The last push restores the behavior where the parsed principal, not the full DN, is audited.
when there are several PKI realms configured, with different principal parsing patterns, parsing the principal before the cert chain validation is wrong. Therefore the principal field in the audit log's authn-failed is also wrong. However, the said push does NOT correct this wrong behavior, choosing to side with maintaining BWC, not least because correcting it would involve more cruft for little benefit.
when neither of the configured PKI realms can parse the certificate's Subject DN, the said push maintains the behavior of falling through to other authentication methods (eg Kerberos) instead of failing authentication as was originally proposed in the PR.

The PR as a whole fixes the bug where the principal could be parsed by a different realm than the one completing the authentication (doing cert chain validation) leading to a score of bugs in complicated setups. Given that the changes effectively reparse and override the principal, AFTER the PKI authentication completed successfully, any "breaking" behavior, in any realm configuration, experienced by users is a fix for a bug, therefore this is not a breaking change but a bug fix.

@tvernum May you please keep me honest on this and re-review the PR?

tvernum

Thanks @albertzaharovits.
LGTM, but I had a couple of small suggestions.

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/pki/PkiRealm.java

tvernum · 2019-07-11T07:27:06Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/pki/PkiRealm.java

+                final String principal = getPrincipalFromSubjectDN(principalPattern, token, logger);
+                if (principal == null) {
+                    listener.onResponse(AuthenticationResult.unsuccessful("Could not parse principal from Subject DN " + token.dn(), null));
+                } else {


I wonder if we want some sort of logging if user.principal() and token.principal() are different.

It might get to noisy to have it as INFO/WARN every time we do a full-authenticate (potentially every 20 minutes, per user) but maybe a DEBUG would help resolve any unexpected behaviours.

I have pushed b16f889 which adds debug loggers for when the extracted principal after cert chain validation differs (or is null) from before. These will be useful for us when we need to explain some unexpected behavior, but it is difficult to put in a log line what exactly it's happening so the admin can take actions on its own; so I don't even try, the log lines only state the facts.

…ecurity/authc/pki/PkiRealm.java Co-Authored-By: Tim Vernum <tim@adjective.org>

albertzaharovits · 2019-07-11T16:47:22Z

@elasticmachine run elasticsearch-ci/default-distro

albertzaharovits · 2019-07-11T16:48:47Z

@elasticmachine run elasticsearch-ci/bwc

albertzaharovits · 2019-07-11T17:29:27Z

@elasticmachine run elasticsearch-ci/2

Fixes a bug in the PKI authentication. This manifests when there are multiple PKI realms configured in the chain, with different principal parse patterns. There are a few configuration scenarios where one PKI realm might parse the principal from the Subject DN (according to the `username_pattern` realm setting) but another one might do the truststore validation (according to the truststore.* realm settings). This is caused by the two passes through the realm chain, first to build the authentication token and secondly to authenticate it, and that the X509AuthenticationToken sets the principal during construction.

albertzaharovits added 3 commits July 3, 2019 11:46

main code and test fix

caef646

PkiRealmTests

cd92c33

Done

f6296c5

albertzaharovits added >bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v8.0.0 v7.3.0 labels Jul 3, 2019

albertzaharovits requested review from bizybot and tvernum July 3, 2019 16:05

albertzaharovits self-assigned this Jul 3, 2019

bizybot approved these changes Jul 3, 2019

View reviewed changes

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/pki/PkiRealm.java Outdated Show resolved Hide resolved

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/pki/PkiRealm.java Outdated Show resolved Hide resolved

albertzaharovits added 3 commits July 3, 2019 22:30

Checkstyle

82edac2

Lower case logger

f2a74ec

Merge branch 'master' into fix-x509-token-principal

e780361

tvernum added v7.4.0 and removed v7.3.0 labels Jul 4, 2019

albertzaharovits added 2 commits July 7, 2019 19:14

Remove PkiRealm.token

80c8031

Merge branch 'master' into fix-x509-token-principal

de00ee2

albertzaharovits mentioned this pull request Jul 9, 2019

Add delegation feature to PKI realm #44106

Merged

Parse principal before authentication for BWC

6f4f11f

albertzaharovits force-pushed the security-pki-delegation-fix-x509token-principal branch from e1eac50 to 6f4f11f Compare July 9, 2019 16:39

PkiRealmTests.testCustomUsernamePatternMismatchesAndFails

406f429

tvernum approved these changes Jul 11, 2019

View reviewed changes

albertzaharovits and others added 2 commits July 11, 2019 18:42

Update x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/s…

b49e354

…ecurity/authc/pki/PkiRealm.java Co-Authored-By: Tim Vernum <tim@adjective.org>

Loggers

b16f889

Merge branch 'master' into fix-x509-token-principal

6cbe7c4

albertzaharovits merged commit 59fc77f into master Jul 11, 2019

albertzaharovits mentioned this pull request Jul 11, 2019

BACKPORT Fix X509AuthenticationToken principal (#43932) #44246

Merged

albertzaharovits deleted the security-pki-delegation-fix-x509token-principal branch July 11, 2019 21:00

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

albertzaharovits commented Jul 3, 2019

Uh oh!

elasticmachine commented Jul 3, 2019

Uh oh!

bizybot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

albertzaharovits commented Jul 3, 2019

Uh oh!

tvernum commented Jul 8, 2019

Uh oh!

albertzaharovits commented Jul 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tvernum commented Jul 9, 2019

Uh oh!

tvernum commented Jul 9, 2019

Uh oh!

albertzaharovits commented Jul 9, 2019

Uh oh!

albertzaharovits commented Jul 9, 2019

Uh oh!

tvernum commented Jul 9, 2019

Uh oh!

albertzaharovits commented Jul 9, 2019

Uh oh!

albertzaharovits commented Jul 9, 2019

Uh oh!

tvernum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tvernum Jul 11, 2019

Choose a reason for hiding this comment

Uh oh!

albertzaharovits Jul 11, 2019

Choose a reason for hiding this comment

Uh oh!

albertzaharovits commented Jul 11, 2019

Uh oh!

albertzaharovits commented Jul 11, 2019

Uh oh!

albertzaharovits commented Jul 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

albertzaharovits commented Jul 8, 2019 •

edited

Loading