Skip to content

KAFKA-8933; Fix NPE in DefaultMetadataUpdater after authentication failure#7682

Merged
hachikuji merged 3 commits into
apache:trunkfrom
hachikuji:KAFKA-8933
Dec 4, 2019
Merged

KAFKA-8933; Fix NPE in DefaultMetadataUpdater after authentication failure#7682
hachikuji merged 3 commits into
apache:trunkfrom
hachikuji:KAFKA-8933

Conversation

@hachikuji

@hachikuji hachikuji commented Nov 12, 2019

Copy link
Copy Markdown
Contributor

This patch fixes an NPE in DefaultMetadataUpdater due to an inconsistency in event expectations. Whenever there is an authentication failure, we were treating it as a failed update even if was from a separate connection from an inflight metadata request. This patch fixes the problem by making the MetadataUpdater api clearer in terms of the events that are handled.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@stanislavkozlovski stanislavkozlovski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I gave it a look mostly to help familiarize myself with the code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we call this maybeAuthException like in the interface?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a debug log level is appropriate here. We discussed this way back in #3832 (comment)

I imagine this would look something like

ERROR [Controller id=0, targetBrokerId=0] Connection to node 0 failed authentication due to: Authentication failed due to invalid credentials with SASL mechanism PLAIN (org.apache.kafka.clients.NetworkClient)
ERROR Received authentication failure from brokerId=0 (org.apache.kafka.clients.ManualMetadataUpdater)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think debug logging would be sufficient for this one as well as the log entry below for normal disconnect.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, do these log messages have any value? Perhaps we can get rid of them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajinisivaram do you know of any cases where the maybeFatalException.get() may not be descriptive enough?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we call handleDisconnect here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was for handleDisconnect to only handle unexpected disconnects. I'll change the name to handleServerDisconnect to make that clearer. The thought, in the context of a MetadataUpdater, was that a server disconnect might trigger an attempt to refresh metadata. Looks like it's just a little more work to push this through and then we can get rid of the requestUpdate API.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isUpdateDue() already checks for !hasFetchInProgress()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootstrap() itself should mandate an update, right? Do we need to call requestUpdate?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this was strictly necessary. I added it during one of the refactors. I will remove it.

@ijuma

ijuma commented Nov 25, 2019

Copy link
Copy Markdown
Member

@rajinisivaram maybe you can help review this.

@rajinisivaram rajinisivaram left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji Thanks for the PR. Left a minor comment and @stanislavkozlovski had some good comments too, apart from those LGTM.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think debug logging would be sufficient for this one as well as the log entry below for normal disconnect.

@hachikuji

Copy link
Copy Markdown
Contributor Author

@stanislavkozlovski @rajinisivaram This is ready for another look when you get a chance.

@stanislavkozlovski stanislavkozlovski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR :)

@rajinisivaram rajinisivaram left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji Thanks for the updates, LGTM

@hachikuji

Copy link
Copy Markdown
Contributor Author

retest this please

2 similar comments
@hachikuji

Copy link
Copy Markdown
Contributor Author

retest this please

@hachikuji

Copy link
Copy Markdown
Contributor Author

retest this please

@hachikuji hachikuji merged commit 5b6de9f into apache:trunk Dec 4, 2019
@hachikuji hachikuji deleted the KAFKA-8933 branch December 4, 2019 21:23
hachikuji added a commit that referenced this pull request Jan 2, 2020
…ilure (#7682)

This patch fixes an NPE in `DefaultMetadataUpdater` due to an inconsistency in event expectations. Whenever there is an authentication failure, we were treating it as a failed update even if was from a separate connection from an inflight metadata request. This patch fixes the problem by making the `MetadataUpdater` api clearer in terms of the events that are handled.

Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
qq619618919 pushed a commit to qq619618919/kafka that referenced this pull request May 12, 2020
…ilure (apache#7682)

This patch fixes an NPE in `DefaultMetadataUpdater` due to an inconsistency in event expectations. Whenever there is an authentication failure, we were treating it as a failed update even if was from a separate connection from an inflight metadata request. This patch fixes the problem by making the `MetadataUpdater` api clearer in terms of the events that are handled.

Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants