Skip to content

Fix concurrent refresh of tokens#55114

Merged
jkakavas merged 8 commits intoelastic:masterfrom
jkakavas:concurrent-token-refresh-and-authn
Apr 24, 2020
Merged

Fix concurrent refresh of tokens#55114
jkakavas merged 8 commits intoelastic:masterfrom
jkakavas:concurrent-token-refresh-and-authn

Conversation

@jkakavas
Copy link
Copy Markdown
Contributor

Our handling for concurrent refresh of access tokens suffered from
a race condition where:

  1. Thread A has just finished with updating the existing token
    document, but hasn't stored the new tokens in a new document
    yet
  2. Thread B attempts to refresh the same token and since the
    original token document is marked as refreshed, it decrypts and
    gets the new access token and refresh token and returns that to
    the caller of the API.
  3. The caller attempts to use the newly refreshed access token
    immediately and gets an authentication error since Thread a still
    hasn't finished writing the document.

This commit changes the behavior so that Thread B, would first try
to do a Get request for the token document where it expects that
the access token it decrypted is stored(with exponential backoff)
and will not respond until it can verify that it reads it in the
tokens index. That ensures that we only ever return tokens in a
response if they are already valid and can be used immediately

It also adjusts
TokenAuthIntegTests#testRefreshingMultipleTimesWithinWindowSucceeds
to test authenticating with the tokens each thread receives,
which fails without the fix.

Resolves: #54289

Our handling for cncurrent refresh of access tokens suffered from
a race condition where:

1. Thread A has just finished with updating the existing token
document, but hasn't stored the new tokens in a new document
yet
2. Thread B attempts to refresh the same token and since the
original token document is marked as refreshed, it decrypts and
gets the new access token and refresh token and returns that to
the caller of the API.
3. The caller attempts to use the newly refreshed access token
immediately and gets an authentcation error since Thread a still
hasn't finished writing the document.

This commit changes the behavior so that Thread B, would first try
to do a Get request for the token document where it expects that
the access token it decrypted is stored(with exponential backoff)
and will not respond until it can verify that it reads it in the
tokens index. That ensures that we only ever return tokens in a
response if they are already valid and can be used immediately

It also adjusts
TokenAuthIntegTests#testRefreshingMultipleTimesWithinWindowSucceeds
to test authenticating with the tokens each thread receives,
which fails without the fix.

Resolves: elastic#54289
@jkakavas jkakavas added >bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v8.0.0 v7.7.1 labels Apr 13, 2020
@jkakavas jkakavas requested review from tvernum and ywangd April 13, 2020 13:57
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-security (:Security/Authentication)

Copy link
Copy Markdown
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments. Otherwise LGTM

Comment on lines +1131 to +1139
if (backoff.hasNext()) {
logger.info("could not get token document [{}] that should have been created, retrying", tokenDocId);
client.threadPool().schedule(() -> getTokenDocAsync(tokenDocId, tokensIndex, this),
backoff.next(), GENERIC);
} else {
logger.warn("could not get token document [{}] that should have been created after all retries",
tokenDocId);
onFailure.accept(invalidGrantException("could not refresh the requested token"));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract this logic into a consumer? It can be reused for the onReponse part as well.

assertThat(failed.get(), equalTo(false));
// Assert that we only ever got one access_token/refresh_token pair
assertThat(tokens.stream().distinct().collect(Collectors.toList()).size(), equalTo(1));
// Assert that we only ever got one anot(contccess_token/refresh_token pair
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anot(contccess_token - probably some random extra keystrokes?

assertThat((int) tokens.stream().distinct().count(), equalTo(1));
// Assert that all requests from all threads could authenticate at the time they received the access token
// see: https://github.com/elastic/elasticsearch/issues/54289
assertThat(authStatuses, not(hasItem(RestStatus.UNAUTHORIZED)));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: alternatively, maybe we could verify all statuses are OK since getAuthenticationResponseCode always returns OK when no exception is thrown. It feels more precise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking in terms of testing for the bug this is solving but these all should be OK so I see your point, will address

@jkakavas
Copy link
Copy Markdown
Contributor Author

Ping @tvernum

Copy link
Copy Markdown
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with 1 suggestion for optimization

onFailure.accept(invalidGrantException("could not refresh the requested token"));
}
};
getTokenDocAsync(tokenDocId, tokensIndex, new ActionListener<>() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this, and the call above should explicitly not fetch the source since we only care about an exists check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed ! Thanks

@jkakavas jkakavas merged commit c20d3e9 into elastic:master Apr 24, 2020
jkakavas added a commit to jkakavas/elasticsearch that referenced this pull request Apr 24, 2020
Our handling for concurrent refresh of access tokens suffered from
a race condition where:

1. Thread A has just finished with updating the existing token
document, but hasn't stored the new tokens in a new document
yet
2. Thread B attempts to refresh the same token and since the
original token document is marked as refreshed, it decrypts and
gets the new access token and refresh token and returns that to
the caller of the API.
3. The caller attempts to use the newly refreshed access token
immediately and gets an authentication error since thread A still
hasn't finished writing the document.

This commit changes the behavior so that Thread B, would first try
to do a Get request for the token document where it expects that
the access token it decrypted is stored(with exponential backoff )
and will not respond until it can verify that it reads it in the
tokens index. That ensures that we only ever return tokens in a
response if they are already valid and can be used immediately

It also adjusts TokenAuthIntegTests
to test authenticating with the tokens each thread receives,
which would fail without the fix.

Resolves: elastic#54289
jkakavas added a commit that referenced this pull request Apr 27, 2020
Our handling for concurrent refresh of access tokens suffered from
a race condition where:

1. Thread A has just finished with updating the existing token
document, but hasn't stored the new tokens in a new document
yet
2. Thread B attempts to refresh the same token and since the
original token document is marked as refreshed, it decrypts and
gets the new access token and refresh token and returns that to
the caller of the API.
3. The caller attempts to use the newly refreshed access token
immediately and gets an authentication error since thread A still
hasn't finished writing the document.

This commit changes the behavior so that Thread B, would first try
to do a Get request for the token document where it expects that
the access token it decrypted is stored(with exponential backoff )
and will not respond until it can verify that it reads it in the
tokens index. That ensures that we only ever return tokens in a
response if they are already valid and can be used immediately

It also adjusts TokenAuthIntegTests
to test authenticating with the tokens each thread receives,
which would fail without the fix.

Resolves: #54289
jkakavas added a commit to jkakavas/elasticsearch that referenced this pull request Apr 27, 2020
Our handling for concurrent refresh of access tokens suffered from
a race condition where:

1. Thread A has just finished with updating the existing token
document, but hasn't stored the new tokens in a new document
yet
2. Thread B attempts to refresh the same token and since the
original token document is marked as refreshed, it decrypts and
gets the new access token and refresh token and returns that to
the caller of the API.
3. The caller attempts to use the newly refreshed access token
immediately and gets an authentication error since thread A still
hasn't finished writing the document.

This commit changes the behavior so that Thread B, would first try
to do a Get request for the token document where it expects that
the access token it decrypted is stored(with exponential backoff )
and will not respond until it can verify that it reads it in the
tokens index. That ensures that we only ever return tokens in a
response if they are already valid and can be used immediately

It also adjusts TokenAuthIntegTests
to test authenticating with the tokens each thread receives,
which would fail without the fix.

Resolves: elastic#54289
jkakavas added a commit that referenced this pull request Apr 27, 2020
Our handling for concurrent refresh of access tokens suffered from
a race condition where:

1. Thread A has just finished with updating the existing token
document, but hasn't stored the new tokens in a new document
yet
2. Thread B attempts to refresh the same token and since the
original token document is marked as refreshed, it decrypts and
gets the new access token and refresh token and returns that to
the caller of the API.
3. The caller attempts to use the newly refreshed access token
immediately and gets an authentication error since thread A still
hasn't finished writing the document.

This commit changes the behavior so that Thread B, would first try
to do a Get request for the token document where it expects that
the access token it decrypted is stored(with exponential backoff )
and will not respond until it can verify that it reads it in the
tokens index. That ensures that we only ever return tokens in a
response if they are already valid and can be used immediately

It also adjusts TokenAuthIntegTests
to test authenticating with the tokens each thread receives,
which would fail without the fix.

Resolves: #54289
jkakavas added a commit to jkakavas/elasticsearch that referenced this pull request Aug 31, 2020
@ywangd made an awesome analysis on why this test is failing, over
at elastic#55816 (comment)

This change makes it so that we use the same client to perform a
refresh of a token, as we use to subsequently attempt to authenticate
with the refreshed token. This ensures the tests are failing and is
a good approximation of how we expect the same client doing the
refresh, to also perform the subsequent authentication in real life
uses.

The errors we were seeing from users have dissappeared after elastic#55114
so we deem our behavior safe.
jkakavas added a commit to jkakavas/elasticsearch that referenced this pull request Aug 31, 2020
@ywangd made an awesome analysis on why this test is failing, over
at elastic#55816 (comment)

This change makes it so that we use the same client to perform a
refresh of a token, as we use to subsequently attempt to authenticate
with the refreshed token. This ensures the tests are failing and is
a good approximation of how we expect the same client doing the
refresh, to also perform the subsequent authentication in real life
uses.

The errors we were seeing from users have disappeared after elastic#55114
so we deem our behavior safe.
jkakavas added a commit that referenced this pull request Sep 1, 2020
@ywangd made an awesome analysis on why this test is failing, over
at #55816 (comment)

This change makes it so that we use the same client to perform a
refresh of a token, as we use to subsequently attempt to authenticate
with the refreshed token. This ensures the tests are failing and is
a good approximation of how we expect the same client doing the
refresh, to also perform the subsequent authentication in real life
uses.

The errors we were seeing from users have dissappeared after #55114
so we deem our behavior safe.
jkakavas added a commit that referenced this pull request Sep 1, 2020
@ywangd made an awesome analysis on why this test is failing, over
at #55816 (comment)

This change makes it so that we use the same client to perform a
refresh of a token, as we use to subsequently attempt to authenticate
with the refreshed token. This ensures the tests are failing and is
a good approximation of how we expect the same client doing the
refresh, to also perform the subsequent authentication in real life
uses.

The errors we were seeing from users have disappeared after #55114
so we deem our behavior safe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v7.7.1 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

It's not possible to use newly refreshed access token while the "original" refresh token is still being processed

5 participants