Skip to content

Fix X509AuthenticationToken principal#43932

Merged
albertzaharovits merged 13 commits intomasterfrom
security-pki-delegation-fix-x509token-principal
Jul 11, 2019
Merged

Fix X509AuthenticationToken principal#43932
albertzaharovits merged 13 commits intomasterfrom
security-pki-delegation-fix-x509token-principal

Conversation

@albertzaharovits
Copy link
Copy Markdown
Contributor

Fixes a bug in the PKI authentication where, when multiple PKI realms are configured in the chain, one realm might parse the principal from the subject DN (according to the username_pattern realm setting) and another one might pass the truststore validation (according to truststore.* settings).

This is due to the two passes through the realm chain, first to build the authentication token and second to authenticate it, and that the X509AuthenticationToken sets the principal during construction.

This fix shifts parsing the principal from the token construction to the authorization process.

Relates #43796 #34396

@albertzaharovits albertzaharovits added >bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v8.0.0 v7.3.0 labels Jul 3, 2019
@albertzaharovits albertzaharovits self-assigned this Jul 3, 2019
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-security

Copy link
Copy Markdown
Contributor

@bizybot bizybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, after addressing the comments. Thank you.

@albertzaharovits
Copy link
Copy Markdown
Contributor Author

Thanks a lot @bizybot for the fast review here as well!

@tvernum tvernum added v7.4.0 and removed v7.3.0 labels Jul 4, 2019
@tvernum
Copy link
Copy Markdown
Contributor

tvernum commented Jul 8, 2019

I understand the rationale for this, but it risks being a breaking change (audit events rely on the principal from the token) and to my knowledge no one has ever complained about this current behaviour.

The behaviour is certainly wrong (in the sense that are some configurations that won't work even though they should), but did you evaluate the impact against the benefit?

@albertzaharovits
Copy link
Copy Markdown
Contributor Author

albertzaharovits commented Jul 8, 2019

I understand the rationale for this, but it risks being a breaking change (audit events rely on the principal from the token) and to my knowledge no one has ever complained about this current behaviour.

The principal in the audit log is wrong, as it could be parsed by any other installed PKIRealm, and not the one doing the actual authentication. This is a manifestation of the bug. I don't think we should wait for someone to complain about it to fix this.

The behaviour is certainly wrong (in the sense that are some configurations that won't work even though they should), but did you evaluate the impact against the benefit?

When making the code changes I have followed where the X509AuthenticationToken.principal() is referred to in the code, and apart from the audit log and response headers there were no other places. Do you have any suggestion how I could otherwise evaluate the impact of this?

@tvernum
Copy link
Copy Markdown
Contributor

tvernum commented Jul 9, 2019

The most common scenario for PKI is a single realm.
The current behaviour in such a case is that when PKI is used, the audit records (and audit filters) for authentication-failed will use the same principal as all other records (authc-success, authz-failed, etc).
The new behaviour is that this will change and for reasons that are totally irrelevant (*) to the es-admin, the authc-failed events will be filtered/logged differently to the others.

(*) Because in their case there is only a single username_pattern, so there is only 1 possible way to turn a cert-DN into a principal.

This bug only manifests if you have 2 PKI realms, with different username_pattern settings, where the higher order realm's pattern does not trust certificates issued for the 2nd realm, yet that first realm will successfully match against the DNs for the 2nd realm, but produce the wrong principal.

That's a scenario that we imply will work (because the settings allow it, and the docs don't call it out as unsupported) but I doubt we've ever had a single person run into it, and I can't imagine why anyone would (or at least if they did, why they couldn't fix it by tweaking their 1st pattern to be more strict/precise).

We know that our token extraction process is flawed, and I see this issue as a manifestation of that. But this feels like we're working around that problem (again) rather than fixing it, while at the same time potentially breaking the audit behaviour for existing installs.

I'm not saying No, I'm just saying Why make this trade-off right now?.
Why risk impacting existing users in order to band-aid over the token extraction problem in a minor release?

@tvernum
Copy link
Copy Markdown
Contributor

tvernum commented Jul 9, 2019

The most common scenario for PKI is a single realm.
The current behaviour in such a case is that when PKI is used, the audit records (and audit filters) for authentication-failed will use the same principal as all other records (authc-success, authz-failed, etc).

Actually, I just realised that the BWC issue could be worse than that, although it is unlikely to be a problem in practice.
Currently, if the DN does not match the pattern, then no token (null) is returned. Which means another realm (e.g. Kerberos) might have an opportunity to generate a token object, and then authenticate.

The proposed behaviour would cause the PKI realm to create a token, but then fail to authenticate because the principal cannot be extracted from the DN.

One solution would be to continue to parse the principal in the first matching realm, but then re-parse it during authenticate on each realm. That's truly quite ugly, but it would fix the bug in a BWC way.

We clearly need to fix the AuthenticationToken process for 8.0, because we're frequently running into its limitations, but I'd like us to tread carefully about making potentially breaking changes in 7.x

@albertzaharovits
Copy link
Copy Markdown
Contributor Author

The current behaviour in such a case is that when PKI is used, the audit records (and audit filters) for authentication-failed will use the same principal as all other records (authc-success, authz-failed, etc).
The new behaviour is that this will change and for reasons that are totally irrelevant () to the es-admin, the authc-failed events will be filtered/logged differently to the others.
(
) Because in their case there is only a single username_pattern, so there is only 1 possible way to turn a cert-DN into a principal.

The authentication_failed would indeed contain the Subject DN instead of the parsed principal. But, in my opinion, the implications of this are exaggerated, or at least not fairly weighted against the status quo. First of, there are no other audit records to correlate with in case of an authn-failure event, because it doesn't go further into authz, etc. The authn-success principal field is unchanged by this and can be correlated with authz events, as usual. Secondly, we have other realms (Kerberos) where the principal cannot be determined before authentication is successful, so there is precedent to this. The argument, that the es-admin might complain about the unparsed Subject DN in authn-failed audits, even though it only has a single PKI realm installed (hence a single way to parse it), is also weak, because I can also imagine es-admins complaining on the converse, that the audit contains parsed DNs and they have to reconstruct the DN to identify the Subject DNs of the certificates being denied.

This bug only manifests if you have 2 PKI realms, with different username_pattern settings, where the higher order realm's pattern does not trust certificates issued for the 2nd realm, yet that first realm will successfully match against the DNs for the 2nd realm, but produce the wrong principal.
That's a scenario that we imply will work (because the settings allow it, and the docs don't call it out as unsupported) but I doubt we've ever had a single person run into it, and I can't imagine why anyone would (or at least if they did, why they couldn't fix it by tweaking their 1st pattern to be more strict/precise).

There are other scenarios. For example, both realms could have the same trust certificates, but different principal parsing rules, where patterns could be for different OU= parts of the DN, so there is not one more general than the other, for example CN=(.*?), OU=engineering and CN=(.*?), OU=marketing. In this case, no matter the realm order and the certificate's Subject DN, you will always get authenticated by the first PKI realm in the chain. This leads to more than bogus audit entries (wrong realm name), because there could be role-mapping rules for the realm name.
You could also argue that this scenario is unlikely. But my point is that the where the higher order realm's pattern does not trust certificates issued for the 2nd realm, yet that first realm will successfully match against the DNs for the 2nd realm is not a fair conditional for the bug to manifest; it is enough to have two PKIRealms with different principal parsing rules.

We know that our token extraction process is flawed, and I see this issue as a manifestation of that. But this feels like we're working around that problem (again) rather than fixing it, while at the same time potentially breaking the audit behaviour for existing installs.
I'm not saying No, I'm just saying Why make this trade-off right now?.
Why risk impacting existing users in order to band-aid over the token extraction problem in a minor release?

A "proper" fix must go in a major release. But this should not preclude a fix in the 7.x releases. As detailed above, in my opinion, we have to weigh the breaking of the authn-failed audit entries (which are not correlat-able to other events) against the broken state of multiple PKIRealms with different principal parsing rules. As another example, to accentuate the severity of the issue, suppose you have two PKIRealms with different truststores and different CN parsing rules. There are circumstances in which a given certificate that SHOULD NOT be allowed access IS authenticated because of the disconnection of the DN parsing to chain validation (if you wish, add to that role mapping rules by realm name). If someone where to report this, they would be using secure@ for sure. Of course this can all be discounted, by saying The most common scenario for PKI is a single realm. but we need to do something about it nonetheless.

This came up, during the implementation of the PKI delegation, because there the first pass of the authentication process of first building the X509AuthenticationToken is skipped as the token is built by the transport action. The transport action could pick the first realm in the chain and use that parsing rule, but this, in my opinion, is a much uglier band-aid that the proposed fix.

@albertzaharovits
Copy link
Copy Markdown
Contributor Author

Currently, if the DN does not match the pattern, then no token (null) is returned. Which means another realm (e.g. Kerberos) might have an opportunity to generate a token object, and then authenticate.
The proposed behaviour would cause the PKI realm to create a token, but then fail to authenticate because the principal cannot be extracted from the DN.

Good catch! Though this implies that the client uses multiple authn methods for the same request which is not supported so we should not worry about breaking anything there.

@tvernum
Copy link
Copy Markdown
Contributor

tvernum commented Jul 9, 2019

Good catch! Though this implies that the client uses multiple authn methods for the same request which is not supported so we should not worry about breaking anything there.

Why is that not supported?
We support using client certificates alongside other authentication schemes.

@albertzaharovits
Copy link
Copy Markdown
Contributor Author

Why is that not supported?
We support using client certificates alongside other authentication schemes.

Yes, we support client certificates alongside other authentication schemes. I was talking generally, that we don't support two authentication schemes such that if one fails try the other one, apart from the client certificates, which I assumed that it was a coincidence. Judging from your response, it might not have been a coincidence, but it's certainly not a feature we document.

@albertzaharovits albertzaharovits force-pushed the security-pki-delegation-fix-x509token-principal branch from e1eac50 to 6f4f11f Compare July 9, 2019 16:39
@albertzaharovits
Copy link
Copy Markdown
Contributor Author

Me and Tim discussed on another channel about the BWC implications, and we concurred that there was an easy way to avoid unnecessary breaking changes.
I have pushed 6f4f11f . As discussed, this last push achieves the following:

  • when there is a single PKI realm installed there is no reason to break BWC by auditing authn-failures with the unparsed DN as principal. The last push restores the behavior where the parsed principal, not the full DN, is audited.
  • when there are several PKI realms configured, with different principal parsing patterns, parsing the principal before the cert chain validation is wrong. Therefore the principal field in the audit log's authn-failed is also wrong. However, the said push does NOT correct this wrong behavior, choosing to side with maintaining BWC, not least because correcting it would involve more cruft for little benefit.
  • when neither of the configured PKI realms can parse the certificate's Subject DN, the said push maintains the behavior of falling through to other authentication methods (eg Kerberos) instead of failing authentication as was originally proposed in the PR.

The PR as a whole fixes the bug where the principal could be parsed by a different realm than the one completing the authentication (doing cert chain validation) leading to a score of bugs in complicated setups. Given that the changes effectively reparse and override the principal, AFTER the PKI authentication completed successfully, any "breaking" behavior, in any realm configuration, experienced by users is a fix for a bug, therefore this is not a breaking change but a bug fix.

@tvernum May you please keep me honest on this and re-review the PR?

Copy link
Copy Markdown
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @albertzaharovits.
LGTM, but I had a couple of small suggestions.

final String principal = getPrincipalFromSubjectDN(principalPattern, token, logger);
if (principal == null) {
listener.onResponse(AuthenticationResult.unsuccessful("Could not parse principal from Subject DN " + token.dn(), null));
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want some sort of logging if user.principal() and token.principal() are different.

It might get to noisy to have it as INFO/WARN every time we do a full-authenticate (potentially every 20 minutes, per user) but maybe a DEBUG would help resolve any unexpected behaviours.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed b16f889 which adds debug loggers for when the extracted principal after cert chain validation differs (or is null) from before. These will be useful for us when we need to explain some unexpected behavior, but it is difficult to put in a log line what exactly it's happening so the admin can take actions on its own; so I don't even try, the log lines only state the facts.

albertzaharovits and others added 2 commits July 11, 2019 18:42
…ecurity/authc/pki/PkiRealm.java

Co-Authored-By: Tim Vernum <tim@adjective.org>
@albertzaharovits
Copy link
Copy Markdown
Contributor Author

@elasticmachine run elasticsearch-ci/default-distro

@albertzaharovits
Copy link
Copy Markdown
Contributor Author

@elasticmachine run elasticsearch-ci/bwc

@albertzaharovits
Copy link
Copy Markdown
Contributor Author

@elasticmachine run elasticsearch-ci/2

@albertzaharovits albertzaharovits merged commit 59fc77f into master Jul 11, 2019
albertzaharovits added a commit to albertzaharovits/elasticsearch that referenced this pull request Jul 11, 2019
Fixes a bug in the PKI authentication. This manifests when there
are multiple PKI realms configured in the chain, with different
principal parse patterns. There are a few configuration scenarios
where one PKI realm might parse the principal from the Subject
DN (according to the `username_pattern` realm setting) but
another one might do the truststore validation (according to
the truststore.* realm settings).

This is caused by the two passes through the realm chain, first to
build the authentication token and secondly to authenticate it, and
that the X509AuthenticationToken sets the principal during
construction.
@albertzaharovits albertzaharovits deleted the security-pki-delegation-fix-x509token-principal branch July 11, 2019 21:00
albertzaharovits added a commit that referenced this pull request Jul 12, 2019
Fixes a bug in the PKI authentication. This manifests when there
are multiple PKI realms configured in the chain, with different
principal parse patterns. There are a few configuration scenarios
where one PKI realm might parse the principal from the Subject
DN (according to the `username_pattern` realm setting) but
another one might do the truststore validation (according to
the truststore.* realm settings).

This is caused by the two passes through the realm chain, first to
build the authentication token and secondly to authenticate it, and
that the X509AuthenticationToken sets the principal during
construction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v7.4.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants