Fix concurrent refresh of tokens by jkakavas · Pull Request #55114 · elastic/elasticsearch

jkakavas · 2020-04-13T13:57:26Z

Our handling for concurrent refresh of access tokens suffered from
a race condition where:

Thread A has just finished with updating the existing token
document, but hasn't stored the new tokens in a new document
yet
Thread B attempts to refresh the same token and since the
original token document is marked as refreshed, it decrypts and
gets the new access token and refresh token and returns that to
the caller of the API.
The caller attempts to use the newly refreshed access token
immediately and gets an authentication error since Thread a still
hasn't finished writing the document.

This commit changes the behavior so that Thread B, would first try
to do a Get request for the token document where it expects that
the access token it decrypted is stored(with exponential backoff)
and will not respond until it can verify that it reads it in the
tokens index. That ensures that we only ever return tokens in a
response if they are already valid and can be used immediately

It also adjusts
TokenAuthIntegTests#testRefreshingMultipleTimesWithinWindowSucceeds
to test authenticating with the tokens each thread receives,
which fails without the fix.

Resolves: #54289

Our handling for cncurrent refresh of access tokens suffered from a race condition where: 1. Thread A has just finished with updating the existing token document, but hasn't stored the new tokens in a new document yet 2. Thread B attempts to refresh the same token and since the original token document is marked as refreshed, it decrypts and gets the new access token and refresh token and returns that to the caller of the API. 3. The caller attempts to use the newly refreshed access token immediately and gets an authentcation error since Thread a still hasn't finished writing the document. This commit changes the behavior so that Thread B, would first try to do a Get request for the token document where it expects that the access token it decrypted is stored(with exponential backoff) and will not respond until it can verify that it reads it in the tokens index. That ensures that we only ever return tokens in a response if they are already valid and can be used immediately It also adjusts TokenAuthIntegTests#testRefreshingMultipleTimesWithinWindowSucceeds to test authenticating with the tokens each thread receives, which fails without the fix. Resolves: elastic#54289

elasticmachine · 2020-04-13T13:57:28Z

Pinging @elastic/es-security (:Security/Authentication)

…fresh-and-authn

ywangd

A few minor comments. Otherwise LGTM

ywangd · 2020-04-14T02:56:40Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

+                            if (backoff.hasNext()) {
+                                logger.info("could not get token document [{}] that should have been created, retrying", tokenDocId);
+                                client.threadPool().schedule(() -> getTokenDocAsync(tokenDocId, tokensIndex, this),
+                                    backoff.next(), GENERIC);
+                            } else {
+                                logger.warn("could not get token document [{}] that should have been created after all retries",
+                                    tokenDocId);
+                                onFailure.accept(invalidGrantException("could not refresh the requested token"));
+                            }


Maybe extract this logic into a consumer? It can be reused for the onReponse part as well.

ywangd · 2020-04-14T03:00:15Z

...lugin/security/src/test/java/org/elasticsearch/xpack/security/authc/TokenAuthIntegTests.java

        assertThat(failed.get(), equalTo(false));
-        // Assert that we only ever got one access_token/refresh_token pair
-        assertThat(tokens.stream().distinct().collect(Collectors.toList()).size(), equalTo(1));
+        // Assert that we only ever got one anot(contccess_token/refresh_token pair


anot(contccess_token - probably some random extra keystrokes?

ywangd · 2020-04-14T03:06:10Z

...lugin/security/src/test/java/org/elasticsearch/xpack/security/authc/TokenAuthIntegTests.java

+        assertThat((int) tokens.stream().distinct().count(), equalTo(1));
+        // Assert that all requests from all threads could authenticate at the time they received the access token
+        // see: https://github.com/elastic/elasticsearch/issues/54289
+        assertThat(authStatuses, not(hasItem(RestStatus.UNAUTHORIZED)));


nits: alternatively, maybe we could verify all statuses are OK since getAuthenticationResponseCode always returns OK when no exception is thrown. It feels more precise.

I was thinking in terms of testing for the bug this is solving but these all should be OK so I see your point, will address

…fresh-and-authn

jkakavas · 2020-04-22T04:16:48Z

Ping @tvernum

tvernum

LGTM, with 1 suggestion for optimization

tvernum · 2020-04-24T08:10:08Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

+                        onFailure.accept(invalidGrantException("could not refresh the requested token"));
+                    }
+                };
+                getTokenDocAsync(tokenDocId, tokensIndex, new ActionListener<>() {


I think this, and the call above should explicitly not fetch the source since we only care about an exists check.

Yes, agreed ! Thanks

…fresh-and-authn

Our handling for concurrent refresh of access tokens suffered from a race condition where: 1. Thread A has just finished with updating the existing token document, but hasn't stored the new tokens in a new document yet 2. Thread B attempts to refresh the same token and since the original token document is marked as refreshed, it decrypts and gets the new access token and refresh token and returns that to the caller of the API. 3. The caller attempts to use the newly refreshed access token immediately and gets an authentication error since thread A still hasn't finished writing the document. This commit changes the behavior so that Thread B, would first try to do a Get request for the token document where it expects that the access token it decrypted is stored(with exponential backoff ) and will not respond until it can verify that it reads it in the tokens index. That ensures that we only ever return tokens in a response if they are already valid and can be used immediately It also adjusts TokenAuthIntegTests to test authenticating with the tokens each thread receives, which would fail without the fix. Resolves: elastic#54289

Our handling for concurrent refresh of access tokens suffered from a race condition where: 1. Thread A has just finished with updating the existing token document, but hasn't stored the new tokens in a new document yet 2. Thread B attempts to refresh the same token and since the original token document is marked as refreshed, it decrypts and gets the new access token and refresh token and returns that to the caller of the API. 3. The caller attempts to use the newly refreshed access token immediately and gets an authentication error since thread A still hasn't finished writing the document. This commit changes the behavior so that Thread B, would first try to do a Get request for the token document where it expects that the access token it decrypted is stored(with exponential backoff ) and will not respond until it can verify that it reads it in the tokens index. That ensures that we only ever return tokens in a response if they are already valid and can be used immediately It also adjusts TokenAuthIntegTests to test authenticating with the tokens each thread receives, which would fail without the fix. Resolves: #54289

Our handling for concurrent refresh of access tokens suffered from a race condition where: 1. Thread A has just finished with updating the existing token document, but hasn't stored the new tokens in a new document yet 2. Thread B attempts to refresh the same token and since the original token document is marked as refreshed, it decrypts and gets the new access token and refresh token and returns that to the caller of the API. 3. The caller attempts to use the newly refreshed access token immediately and gets an authentication error since thread A still hasn't finished writing the document. This commit changes the behavior so that Thread B, would first try to do a Get request for the token document where it expects that the access token it decrypted is stored(with exponential backoff ) and will not respond until it can verify that it reads it in the tokens index. That ensures that we only ever return tokens in a response if they are already valid and can be used immediately It also adjusts TokenAuthIntegTests to test authenticating with the tokens each thread receives, which would fail without the fix. Resolves: elastic#54289

Our handling for concurrent refresh of access tokens suffered from a race condition where: 1. Thread A has just finished with updating the existing token document, but hasn't stored the new tokens in a new document yet 2. Thread B attempts to refresh the same token and since the original token document is marked as refreshed, it decrypts and gets the new access token and refresh token and returns that to the caller of the API. 3. The caller attempts to use the newly refreshed access token immediately and gets an authentication error since thread A still hasn't finished writing the document. This commit changes the behavior so that Thread B, would first try to do a Get request for the token document where it expects that the access token it decrypted is stored(with exponential backoff ) and will not respond until it can verify that it reads it in the tokens index. That ensures that we only ever return tokens in a response if they are already valid and can be used immediately It also adjusts TokenAuthIntegTests to test authenticating with the tokens each thread receives, which would fail without the fix. Resolves: #54289

@ywangd

@ywangd made an awesome analysis on why this test is failing, over at elastic#55816 (comment) This change makes it so that we use the same client to perform a refresh of a token, as we use to subsequently attempt to authenticate with the refreshed token. This ensures the tests are failing and is a good approximation of how we expect the same client doing the refresh, to also perform the subsequent authentication in real life uses. The errors we were seeing from users have dissappeared after elastic#55114 so we deem our behavior safe.

@ywangd

@ywangd made an awesome analysis on why this test is failing, over at elastic#55816 (comment) This change makes it so that we use the same client to perform a refresh of a token, as we use to subsequently attempt to authenticate with the refreshed token. This ensures the tests are failing and is a good approximation of how we expect the same client doing the refresh, to also perform the subsequent authentication in real life uses. The errors we were seeing from users have disappeared after elastic#55114 so we deem our behavior safe.

@ywangd

@ywangd made an awesome analysis on why this test is failing, over at #55816 (comment) This change makes it so that we use the same client to perform a refresh of a token, as we use to subsequently attempt to authenticate with the refreshed token. This ensures the tests are failing and is a good approximation of how we expect the same client doing the refresh, to also perform the subsequent authentication in real life uses. The errors we were seeing from users have dissappeared after #55114 so we deem our behavior safe.

@ywangd

@ywangd made an awesome analysis on why this test is failing, over at #55816 (comment) This change makes it so that we use the same client to perform a refresh of a token, as we use to subsequently attempt to authenticate with the refreshed token. This ensures the tests are failing and is a good approximation of how we expect the same client doing the refresh, to also perform the subsequent authentication in real life uses. The errors we were seeing from users have disappeared after #55114 so we deem our behavior safe.

jkakavas added 2 commits April 13, 2020 12:47

cleanup in TokenAuthIntegTests

e545842

jkakavas added >bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v8.0.0 v7.7.1 labels Apr 13, 2020

jkakavas requested review from tvernum and ywangd April 13, 2020 13:57

jkakavas added 2 commits April 13, 2020 18:13

fix test

a1828d5

Merge remote-tracking branch 'origin/master' into concurrent-token-re…

c62ff21

…fresh-and-authn

ywangd approved these changes Apr 14, 2020

View reviewed changes

jkakavas added 2 commits April 14, 2020 09:33

address feedback

c7d6330

Merge remote-tracking branch 'origin/master' into concurrent-token-re…

13c21b2

…fresh-and-authn

tvernum approved these changes Apr 24, 2020

View reviewed changes

jkakavas added 2 commits April 24, 2020 11:49

Fetch source only when needed

9c0f8a0

Merge remote-tracking branch 'origin/master' into concurrent-token-re…

d937a8f

…fresh-and-authn

jkakavas merged commit c20d3e9 into elastic:master Apr 24, 2020

jkakavas mentioned this pull request Apr 24, 2020

Fix concurrent refresh of tokens (#55114) #55733

Merged

jkakavas mentioned this pull request Aug 31, 2020

Unmute TokenAuthIntegTests test #61714

Merged

jkakavas mentioned this pull request Aug 31, 2020

Unmute TokenAuthIntegTests test #61715

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix concurrent refresh of tokens#55114

Fix concurrent refresh of tokens#55114
jkakavas merged 8 commits intoelastic:masterfrom
jkakavas:concurrent-token-refresh-and-authn

jkakavas commented Apr 13, 2020

Uh oh!

elasticmachine commented Apr 13, 2020

Uh oh!

ywangd left a comment

Uh oh!

ywangd Apr 14, 2020

Uh oh!

ywangd Apr 14, 2020

Uh oh!

ywangd Apr 14, 2020

Uh oh!

jkakavas Apr 14, 2020

Uh oh!

jkakavas commented Apr 22, 2020

Uh oh!

tvernum left a comment •

edited

Loading

Uh oh!

tvernum Apr 24, 2020

Uh oh!

jkakavas Apr 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jkakavas commented Apr 13, 2020

Uh oh!

elasticmachine commented Apr 13, 2020

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

ywangd Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

ywangd Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

ywangd Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

jkakavas Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

jkakavas commented Apr 22, 2020

Uh oh!

tvernum left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tvernum Apr 24, 2020

Choose a reason for hiding this comment

Uh oh!

jkakavas Apr 24, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tvernum left a comment •

edited

Loading