Skip to content

[TINKERPOP-3061] fix: failing authentication when multiple initially requests are executed concurrently#2525

Closed
tien wants to merge 1 commit into
apache:3.7-devfrom
tien:fix/failing-initial-authentication
Closed

[TINKERPOP-3061] fix: failing authentication when multiple initially requests are executed concurrently#2525
tien wants to merge 1 commit into
apache:3.7-devfrom
tien:fix/failing-initial-authentication

Conversation

@tien

@tien tien commented Mar 16, 2024

Copy link
Copy Markdown
Contributor

This solution try to resolve the concurrent initial unauthenticated requests problem described in TINKERPOP-3063, TINKERPOP-2132 & TINKERPOP-3061 by batching them for later processing when authentication handshake is in progress.

@tien tien force-pushed the fix/failing-initial-authentication branch from fa43e9a to b65004e Compare March 16, 2024 10:26
@codecov-commenter

codecov-commenter commented Mar 16, 2024

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 35.55556% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.46%. Comparing base (9b46b67) to head (321c327).
⚠️ Report is 453 commits behind head on 3.7-dev.

Files with missing lines Patch % Lines
...mlin/server/handler/SaslAuthenticationHandler.java 36.90% 42 Missing and 11 partials ⚠️
...inkerpop/gremlin/driver/simple/AbstractClient.java 0.00% 5 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             3.7-dev    #2525      +/-   ##
=============================================
+ Coverage      76.14%   76.46%   +0.32%     
- Complexity     13152    13173      +21     
=============================================
  Files           1084     1059      -25     
  Lines          65160    61305    -3855     
  Branches        7285     7303      +18     
=============================================
- Hits           49616    46877    -2739     
+ Misses         12839    11907     -932     
+ Partials        2705     2521     -184     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tien tien force-pushed the fix/failing-initial-authentication branch from b65004e to c003169 Compare March 16, 2024 11:00
@vkagamlyk

Copy link
Copy Markdown
Contributor

I think this is a good solution for concurrent auth issue.

Good to have test for js driver to be 100% sure.
Also missing changelog entry

@tien tien force-pushed the fix/failing-initial-authentication branch from c003169 to 053d329 Compare March 20, 2024 22:05
@tien

tien commented Mar 20, 2024

Copy link
Copy Markdown
Contributor Author

Have added 1 changelog entry & a test to the JS driver 🙏

@vkagamlyk

Copy link
Copy Markdown
Contributor

Have added 1 changelog entry & a test to the JS driver 🙏

Thank you @tien!

VOTE +1

@Cole-Greer

Copy link
Copy Markdown
Contributor

Thanks @tien, looks great. VOTE +1

@kenhuuu

kenhuuu commented Mar 25, 2024

Copy link
Copy Markdown
Contributor

Could you also add a test where the authentication fails with multiple pending requests and you check that all requests get the proper exception in that case?

@Cole-Greer

Copy link
Copy Markdown
Contributor

I agree with @kenhuuu that an additional test is warranted here to ensure that the server will always send a response to every request. We are now entering code freeze week in preparation for the 3.6.7 and 3.7.2 releases. I believe it is fair to grant an exception for a few days to give some time for such a test to be implemented and to ensure this PR can be included in the release.

@tien

tien commented Mar 26, 2024

Copy link
Copy Markdown
Contributor Author

I've added one test for the unhappy path.

@tien tien force-pushed the fix/failing-initial-authentication branch from b66bbe9 to 321c327 Compare March 27, 2024 03:45

@FlorianHockmann FlorianHockmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewed the changes. I like the general approach taken here! It means that we don't have to implement workarounds in all GLVs for this like I initially did in #2522. Thanks a lot for tackling this, @tien!

I added some inline comments, but they are all basically only about code style issues.
I also tested this fix with the test that I wrote for the .NET driver in #2522 and it passed! :-)


import java.util.ArrayList;
import java.util.List;
import java.util.*;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


static class CallbackResponseHandler extends SimpleChannelInboundHandler<ResponseMessage> {
public Consumer<ResponseMessage> callback;
public Map<UUID, Consumer<ResponseMessage>> callback = new HashMap<>();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick) I think we should rename this now that it's no longer just a callback. Maybe something like callbackByRequestId?

}

// If authentication negotiation is pending, store subsequent non-authentication requests for later processing
if (negotiator.get() != null && !requestMessage.getOp().equals(Tokens.OPS_AUTHENTICATION)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick) Isn't negotiator.get() != null duplicate code since we have an if (negotiator.get() == null) right before that returns ?

if (deferredDuration.compareTo(MAX_REQUEST_DEFERRABLE_DURATION) > 0) {
respondWithError(
requestMessage,
builder -> builder.statusMessage("Too many unauthenticated requests").code(ResponseStatusCode.TOO_MANY_REQUESTS),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the problem here really that there are too many unauthenticated requests? Isn't the problem that authentication took longer than MAX_REQUEST_DEFERRABLE_DURATION?
I think as a user it might be good to know whether I simply submitted too many requests or whether authentication is just too slow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm stumped on this 2, do you have a recommendation on what status code & message would make sense here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't UNAUTHORIZED be correct in this case? The description starts with:

The server could not authenticate the request

which makes sense in my opinion if the max duration was passed for the authentication to happen.

And the status message can then contain more detailed information about the duration that was passed, maybe something like: "authentication did not finish in the allowed duration (" + MAX_REQUEST_DEFERRABLE_DURATION + " s)"?

@@ -218,9 +266,19 @@ public void shouldAuthenticateAndWorkWithVariablesOverGraphSONV1Serialization()

private static void assertConnection(final Cluster cluster, final Client client) throws InterruptedException, ExecutionException {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used in 4 different tests, such as shouldAuthenticateWithPlainText. These 4 tests will now fail if submitting multiple requests initially in parallel isn't working.
I think it would be good if we could keep these tests as simple as possible so they don't include parallelization of initial requests. A test like shouldAuthenticateWithPlainText should really only fail if authenticate with plain text isn't working, not if submitting multiple requests in parallel isn't working.

Long story short, I think it would be good if you could revert the changes to this method and instead write a new test specifically for the parallelization issue.

responses.addAll(future4.join());

for (ResponseMessage response : responses) {
if (response.getStatus().getCode() != ResponseStatusCode.AUTHENTICATE) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit hard to understand what the expected outcome is here / what this is really asserting since it's just iterating over all received responses and then either accepting them if their status code is AUTHENTICATE or if it's TOO_MANY_REQUESTS.
Can't we explicitly assert which request should get which response status code? I guess future4 should get TOO_MANY_REQUESTS and the other 3 should get AUTHENTICATE (?).

Not that important, but I think it would also improve readability of this test if these weren't named future1 - future4, but maybe something like futureOfRequestWithinAuthDuration vs futureOfRequestSubmittedTooLate or something like that.

}

@Test
public void shouldFailAuthenticateWithUnAuthenticatedRequestAfterMaxDeferrableDuration() throws Exception {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tien, I want to be absolutely certain that we aren’t going to lose any of these deferred requests in the case of errors. If the server fails to send a response the drivers will just be left hanging indefinitely. If I’m understanding this test right, the first 3 requests are all expected to succeed, and then after a delay the final request is submitted and fails. Would it also be possible to setup a test such that there are multiple pending requests in the server’s deferred requests queue at the time that auth fails, and then we can verify that the correct error message gets sent to all currently pending requests?

@tien

tien commented Apr 4, 2024

Copy link
Copy Markdown
Contributor Author

Sorry, I'm away this week and don't have access to my work laptop. Will take a look at all the pending comments & resolve them next week 🙏

@xiazcy

xiazcy commented Apr 8, 2024

Copy link
Copy Markdown
Contributor

Sorry, I'm away this week and don't have access to my work laptop. Will take a look at all the pending comments & resolve them next week 🙏

No worries, thank you for all the contributions!

Just a quick note. Not sure if you have gotten a chance to start looking at the comments, as we'd like to release this with 3.7.2 this week, we will likely be cherry-picking your changes into another PR for the release branch today. If we do proceed with that we'll be closing this PR, and you shouldn't need to do any further work.

Now there might still be functionality improvements we miss, so please feel free to add additional changes once the branches re-open.

@xiazcy

xiazcy commented Apr 9, 2024

Copy link
Copy Markdown
Contributor

Closing this PR as all changes are merged via 22db8cf for the release. Please feel free to re-open if you find additional improvements and/or updates needed. Thanks!

@xiazcy xiazcy closed this Apr 9, 2024
@tien

tien commented Apr 9, 2024

Copy link
Copy Markdown
Contributor Author

@xiazcy nah I just got back today, thanks for making this available in the next release 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants