-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Addressing 2 Coordinators Elected As Leader (#16411) #16528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added listener method that tracks ZK leader state
|
@kfaraz I took an attempt at resolving this if you can take a look and provide feedback it would be appreciated! Thank you! |
|
Thanks a lot, @razinbouzar ! You beat me to it. 🙂 I have been doing some investigation on this. Overall, your solution makes sense to me. But I am still going to try it out in my local setup that I have been using for testing. One important thing that I have noticed is that the I will share more details here in a bit. |
kfaraz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some suggestions. Overall approach makes sense to me.
I still need to test out the changes in my local setup.
An alternative solution I was thinking of was just to recreate the latch in notLeader() method. We would need to evaluate if one is better than the other.
server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidLeaderSelector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidLeaderSelector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidLeaderSelector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidLeaderSelector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidLeaderSelector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidLeaderSelector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidLeaderSelector.java
Outdated
Show resolved
Hide resolved
For more information, I encountered this in my local testing when I was trying to recreate the latch in Timeline:
An example sequence of events that I observed Just for clarification, the scenario described above is slightly different from the one we are trying to solve. |
This reverts commit 90ea5e8.
server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidLeaderSelector.java
Show resolved
Hide resolved
- Cleanup file formatting and comments - Reduce complexity of the first go by calling the recreateLeaderLatch in the notLeader() method
…rDruidLeaderSelector.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidLeaderSelector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/curator/discovery/CuratorDruidLeaderSelector.java
Show resolved
Hide resolved
kfaraz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a unit test that verifies the new behaviour.
Also the case state = CLOSED still needs to be handled in the isLeader() method of the listener.
- Remove handleConnectionStateChagned method - Remove duplicate code and use recreate leader latch method - Handle LeaderLatch.State.CLOSED in the isLeader() function, log a warning.
Remove unused import
kfaraz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done some basic testing changes with these changes and they look good.
I have a follow up PR #16544 to this where I intend to add tests to verify the behaviour and fix other race conditions.
Thanks a lot for the changes, @razinbouzar !
Fixes #16411 .
Description
Introduces a new private method handleConnectionStateChanged to handle connection state changes, which recreates the leader latch if the connection state is SUSPENDED or LOST.
Fixed the bug ...
Renamed the class ...
Added a forbidden-apis entry ...
Release note
Key changed/added classes in this PR
MyFooOurBarTheirBazThis PR has: