Skip to content

Adding the ability to register a PeerFinderListener to Coordinator#88626

Merged
masseyke merged 3 commits intoelastic:masterfrom
masseyke:feature/PeerFinder-listener
Jul 21, 2022
Merged

Adding the ability to register a PeerFinderListener to Coordinator#88626
masseyke merged 3 commits intoelastic:masterfrom
masseyke:feature/PeerFinder-listener

Conversation

@masseyke
Copy link
Copy Markdown
Member

This came out of #88562. The problem there is that if we are a non-master-eligible node we want to start polling a master-eligible node whenever we realize that there is no elected master. We are notified that there is no elected master in a ClusterChangedEvent. We get the list of master-eligible nodes from PeerFinder#getFoundPeers. However at that the time that we are notified of the ClusterChangeEvent, PeerFinder#getFoundPeers returns an empty Iterable. That collection is populated in another thread, and there is currently no way to get notified of when it is populated. This PR adds the ability to register a PeerFinderListener with Coordinator. The listener has an onFoundPeersUpdated that is called whenever the collection of peers changes (whether added to or removed from).

@masseyke
Copy link
Copy Markdown
Member Author

One question I have is whether this is granular enough. For example, when a master node steps down and a new one is elected, onFoundPeersUpdated is called 3 times in rapid succession:

  1. When the master steps down, onLeaderFailure calls activate on the PeerFinder, putting the name(s) of the other nodes in there
  2. When it gets a response from each of these nodes after it tries to connect to them PeerFinder#onFoundPeersUpdated (and therefore this listener) is called again, even though the number of peers has not actually changed
  3. Once the new master is elected everything is removed from the collection of peers and this listener is called again.

For #88562 I only care about the first one. It's possible the others could have uses too.

@masseyke
Copy link
Copy Markdown
Member Author

@DaveCTurner does this look like what you had in mind when we talked?

@masseyke masseyke added the :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. label Jul 20, 2022
Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense. LGTM (one nit)

}

void onFoundPeersUpdated() {
public void onFoundPeersUpdated() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void onFoundPeersUpdated() {
@Override
public void onFoundPeersUpdated() {

@masseyke masseyke marked this pull request as ready for review July 21, 2022 14:19
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Jul 21, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

@masseyke masseyke merged commit 7b8c2c7 into elastic:master Jul 21, 2022
@masseyke masseyke deleted the feature/PeerFinder-listener branch July 21, 2022 15:12
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 22, 2022
* upstream/master: (40 commits)
  Fix CI job naming
  [ML] disallow autoscaling downscaling in two trained model assignment scenarios (elastic#88623)
  Add "Vector Search" area to changelog schema
  [DOCS] Update API key API (elastic#88499)
  Enable the pipeline on the feature branch (elastic#88672)
  Adding the ability to register a PeerFinderListener to Coordinator (elastic#88626)
  [DOCS] Fix transform painless example syntax (elastic#88364)
  [ML] Muting InternalCategorizationAggregationTests testReduceRandom (elastic#88685)
  Fix double rounding errors for disk usage (elastic#88683)
  Replace health request with a state observer. (elastic#88641)
  [ML] Fail model deployment if all allocations cannot be provided (elastic#88656)
  Upgrade to OpenJDK 18.0.2+9 (elastic#88675)
  [ML] make bucket_correlation aggregation generally available (elastic#88655)
  Adding cardinality support for random_sampler agg (elastic#86838)
  Use custom task instead of generic AckedClusterStateUpdateTask (elastic#88643)
  Reinstate test cluster throttling behavior (elastic#88664)
  Mute testReadBlobWithPrematureConnectionClose
  Simplify plugin descriptor tests (elastic#88659)
  Add CI job for testing more job parallelism
  [ML] make deployment infer requests fully cancellable (elastic#88649)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >enhancement Team:Distributed Meta label for distributed team. v8.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants