Skip to content

Picking master eligible node at random in the master stability health indicator#89841

Merged
masseyke merged 2 commits intoelastic:mainfrom
masseyke:fix/randomly-select-master-eligible-nodes
Sep 7, 2022
Merged

Picking master eligible node at random in the master stability health indicator#89841
masseyke merged 2 commits intoelastic:mainfrom
masseyke:fix/randomly-select-master-eligible-nodes

Conversation

@masseyke
Copy link
Copy Markdown
Member

@masseyke masseyke commented Sep 6, 2022

Previously the same master eligible node was used whenever we were polling a "random" master eligible node in the master stability health indicator, changing only if the set of master eligible nodes changed. And since the set of master eligible nodes was keyed by ephemeral id, if all nodes in the cluster saw the same set of master eligible nodes all nodes would "randomly" poll the same master eligible node when the elected master became null. This change makes it so that the master eligible node is chosen based on a random number generator.

@masseyke masseyke added >non-issue :Distributed/Health Issues for the health report API v8.5.0 labels Sep 6, 2022
@masseyke masseyke requested a review from dakrone September 6, 2022 21:05
@elasticsearchmachine elasticsearchmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Sep 6, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Copy Markdown
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

// Non-private for unit testing
@Nullable
DiscoveryNode getRandomMasterEligibleNode() {
Collection<DiscoveryNode> masterEligibleNodes = getMasterEligibleNodes();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably just make this a Set<DiscoveryNode>, since I don't think we need it to be generic for a reason right? Then we could avoid converting it into an array just to randomly pick one of the elements.

(Not that we have to do that here, maybe an idea for the future)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I couldn't remember why I'd made it a Collection (I assume just to make it no more specific than it had to be at the time?). But I think even if it was a Set we'd have to do something like this (or something different but no better really).

@masseyke masseyke merged commit f596a43 into elastic:main Sep 7, 2022
@masseyke masseyke deleted the fix/randomly-select-master-eligible-nodes branch September 7, 2022 13:17
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Sep 8, 2022
* main: (175 commits)
  Fix integration test on Windows (elastic#89894)
  Avoiding the use of dynamic map keys in the cluster_formation results of the stable master health indicator (elastic#89842)
  Mute org.elasticsearch.tracing.apm.ApmIT.testCapturesTracesForHttpTraffic (elastic#89891)
  Fix typos in audit event types (elastic#89886)
  Synthetic _source: support histogram field (elastic#89833)
  [TSDB] Rename rollup public API to downsample  (elastic#89809)
  Format script values access (elastic#89780)
  [DOCS] Simplifies composite aggregation recommendation (elastic#89878)
  [DOCS] Update CCS compatibility matrix for 8.3 (elastic#88906)
  Fix memory leak when double invoking RestChannel.sendResponse (elastic#89873)
  [ML] Add processor autoscaling decider (elastic#89645)
  Update disk-usage.asciidoc (elastic#89709) (elastic#89874)
  Add allocation deciders in createComponents (elastic#89836)
  Mute flaky H3LatLonGeometryTest.testIndexPoints (elastic#89870)
  Fix typo in get-snapshot-status-api doc (elastic#89865)
  Picking master eligible node at random in the master stability health indicator (elastic#89841)
  Do not reuse the client after a disruption elastic#89815 (elastic#89866)
  [ML] Distribute trained model allocations across availability zones (elastic#89822)
  Increment clientCalledCount before onResponse (elastic#89858)
  AwaitsFix for elastic#89867
  ...
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Sep 8, 2022
* main: (175 commits)
  Fix integration test on Windows (elastic#89894)
  Avoiding the use of dynamic map keys in the cluster_formation results of the stable master health indicator (elastic#89842)
  Mute org.elasticsearch.tracing.apm.ApmIT.testCapturesTracesForHttpTraffic (elastic#89891)
  Fix typos in audit event types (elastic#89886)
  Synthetic _source: support histogram field (elastic#89833)
  [TSDB] Rename rollup public API to downsample  (elastic#89809)
  Format script values access (elastic#89780)
  [DOCS] Simplifies composite aggregation recommendation (elastic#89878)
  [DOCS] Update CCS compatibility matrix for 8.3 (elastic#88906)
  Fix memory leak when double invoking RestChannel.sendResponse (elastic#89873)
  [ML] Add processor autoscaling decider (elastic#89645)
  Update disk-usage.asciidoc (elastic#89709) (elastic#89874)
  Add allocation deciders in createComponents (elastic#89836)
  Mute flaky H3LatLonGeometryTest.testIndexPoints (elastic#89870)
  Fix typo in get-snapshot-status-api doc (elastic#89865)
  Picking master eligible node at random in the master stability health indicator (elastic#89841)
  Do not reuse the client after a disruption elastic#89815 (elastic#89866)
  [ML] Distribute trained model allocations across availability zones (elastic#89822)
  Increment clientCalledCount before onResponse (elastic#89858)
  AwaitsFix for elastic#89867
  ...

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Sep 8, 2022
* main: (283 commits)
  Fix integration test on Windows (elastic#89894)
  Avoiding the use of dynamic map keys in the cluster_formation results of the stable master health indicator (elastic#89842)
  Mute org.elasticsearch.tracing.apm.ApmIT.testCapturesTracesForHttpTraffic (elastic#89891)
  Fix typos in audit event types (elastic#89886)
  Synthetic _source: support histogram field (elastic#89833)
  [TSDB] Rename rollup public API to downsample  (elastic#89809)
  Format script values access (elastic#89780)
  [DOCS] Simplifies composite aggregation recommendation (elastic#89878)
  [DOCS] Update CCS compatibility matrix for 8.3 (elastic#88906)
  Fix memory leak when double invoking RestChannel.sendResponse (elastic#89873)
  [ML] Add processor autoscaling decider (elastic#89645)
  Update disk-usage.asciidoc (elastic#89709) (elastic#89874)
  Add allocation deciders in createComponents (elastic#89836)
  Mute flaky H3LatLonGeometryTest.testIndexPoints (elastic#89870)
  Fix typo in get-snapshot-status-api doc (elastic#89865)
  Picking master eligible node at random in the master stability health indicator (elastic#89841)
  Do not reuse the client after a disruption elastic#89815 (elastic#89866)
  [ML] Distribute trained model allocations across availability zones (elastic#89822)
  Increment clientCalledCount before onResponse (elastic#89858)
  AwaitsFix for elastic#89867
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Health Issues for the health report API >non-issue Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants