Skip to content

Add doc on master elections in DistributedArchitectureGuide#142435

Merged
inespot merged 10 commits intoelastic:mainfrom
inespot:ip/master-election-doc
Feb 24, 2026
Merged

Add doc on master elections in DistributedArchitectureGuide#142435
inespot merged 10 commits intoelastic:mainfrom
inespot:ip/master-election-doc

Conversation

@inespot
Copy link
Copy Markdown
Contributor

@inespot inespot commented Feb 13, 2026

Details the master election flow.

ES-14214

Details master eligibility, node roles, the election flow and failure cases.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 13, 2026

🔍 Preview links for changed docs

@github-actions
Copy link
Copy Markdown
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@inespot inespot force-pushed the ip/master-election-doc branch from 70f489d to cf25b4b Compare February 14, 2026 23:35
@inespot inespot marked this pull request as ready for review February 17, 2026 03:24
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Feb 17, 2026
@inespot inespot added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >non-issue >docs General docs changes and removed needs:triage Requires assignment of a team area label labels Feb 17, 2026
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Feb 17, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Docs Meta label for docs team label Feb 17, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/core-docs (Team:Docs)

@inespot inespot requested a review from DaveCTurner February 17, 2026 03:26

(A node can coordinate a search across several other nodes, when the node itself does not have the data, and then return a result to the caller. Explain this coordinating role)

### Cluster State
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outlined some additional subsections outside of Master Elections, to tackle in subsequent PRs.


[CoordinationMetadata]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationMetadata.java

[VotingConfiguration]: https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationMetadata.java#L326
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR uses the v9.3.0 tag for all links not pointing to top-level classes to make sure the lines stay consistent. The existing documentation is a bit varied. Some sections use specific commits (like Snapshot Repository), and others (like HTTP Server) don't use links at all, just plain function names. If people have strong opinions on which is best, happy to adjust

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to using a release tag like v9.3.0 because it's immutable (in practice) but please don't refer to a branch like main as these things change over time.

Copy link
Copy Markdown
Contributor Author

@inespot inespot Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will adjust for top level classes as well!

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, thanks for this.


[CoordinationMetadata]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationMetadata.java

[VotingConfiguration]: https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationMetadata.java#L326
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to using a release tag like v9.3.0 because it's immutable (in practice) but please don't refer to a branch like main as these things change over time.


[VotingConfiguration]: https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationMetadata.java#L326

The cluster maintains at most a single master at all times. If no master is
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the conceptual goal but it's surprisingly tricksy to even define what it even means for two nodes to be master at the same time. You can definitely have two nodes which each believe they are the master (and e.g. will service TransportMasterNodeAction requests) for a while, the key point is that all but at most one of them will not be able to update the cluster state.

Maybe too early to get into this level of detail? But it is worth saying somewhere, to avoid confusion about the exact invariants on which we can rely? You mention it below that we guarantee there will be at most one master in each term, and that the terms of committed cluster state updates are nondecreasing, so in a sense the term acts as a logical clock, but perhaps say there in the Terms section that different nodes may be at different logical times (i.e. terms) at the same physical time?

Suggested change
The cluster maintains at most a single master at all times. If no master is
The cluster maintains (conceptually at least) at most a single master at all times. If no master is

Also maybe worth highlighting at the top that the point of electing a master (and everything else here) is purely to update the cluster state. The elected master also does other things too but the cluster-state-updating bit is the only essential bit.

Copy link
Copy Markdown
Contributor Author

@inespot inespot Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a sentence about the main point of having a master in 941c7 and then clarified the "two masters" case in 4186d

any [ClusterState] changes until a new master is elected.

To elect a master, Elasticsearch uses a consensus algorithm derived
from [Paxos](https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf). This algorithm is formally defined in a TLA+
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the original paper but maybe worth also linking these for a gentler introduction too:

To elect a master, Elasticsearch uses a consensus algorithm derived
from [Paxos](https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf). This algorithm is formally defined in a TLA+
specification referenced from the [CoordinationState] class. The [Coordinator] class handles the core logic of the
election, and manages how nodes transition between `CANDIDATE`, `LEADER`, and
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth documenting somewhere that CoordinationState is about safety whereas the Coordinator is more about liveness, managing timeouts and other behaviours that guarantee progress. It does a certain amount of admin work too, e.g. preparing all the data structures ahead of a publication.


#### Election Flow

The overall election flow looks like this:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather this was more prose-like (e.g. so you can copy-paste the sentences elsewhere) - I'm not sure the boxes and arrows really add much to this straight-line flow, and they are a royal pain to maintain in future edits.

Something like this perhaps?

  1. Leader failure detected.

    Follower detects current master failure

    See:

    LeaderChecker
    Coordinator.onLeaderFailure()
    
  2. Node becomes CANDIDATE

    Follower transitions to CANDIDATE mode which triggers the discovery process.

    See:

    Coordinator.becomeCandidate()  
    Mode.CANDIDATE
    PeerFinder.activate(...) 
    

etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me, I'll switch this to be pure prose

└───────────────────────────────────────────────┴──────────────────────────────────┘
```

#### Failure Detection
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You asked a good question in the onboarding session about the reasons for having checks in both directions - would you cover that point here?

the next `handleWakeUp` iteration

When [receiving](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/discovery/PeerFinder.java#L534)
a [PeersResponse], [PeerFinder] will reach out to all peers specified in the response, including a potential master. If
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit but maybe worth mentioning that we also reach back out to nodes that send us requests for peers.


[DiscoveryPlugin]: https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/plugins/DiscoveryPlugin.java

Discovery is a fast "gossip-like" protocol by which a node in `CANDIDATE` mode locates master-eligible nodes in the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you want to mention how fast we mean by "fast" but FWIW it will discover every other master-eligible node in at most something like ⌈log₂(D)+1⌉ steps where D is the diameter of the graph of seed host configurations.

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@inespot inespot merged commit c7f0870 into elastic:main Feb 24, 2026
12 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Feb 24, 2026
…on-sliced-reindex

* upstream/main:
  Activity logging improvements (elastic#142901)
  Fix serialization of NodeGpuStatsResponse when no GPU is present (elastic#142937)
  Add doc on master elections in DistributedArchitectureGuide (elastic#142435)
  ESQL: Account for missing StubRelation due to SurrogateExpressions replacement (elastic#142882)
  Add BulkByScrollTask Serialization Tests (elastic#142697)
  Rebalance CI test partitions to reduce Part3 bottleneck (elastic#142930)
  Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlClientYamlIT test {p0=esql/40_tsdb/to_aggregate_metric_double with multi_values} elastic#142964
  Bump OpenTelemetry dependencies (elastic#142323)
  SQL: add support for API key to JDBC and CLI (elastic#142021)
  Ensure requested capability exists (elastic#142695)
  Warn and fall back to local branches.json (elastic#142606)
  [CI] Mute testWithFetchFailures, testAddCompletionListenerScheduleErr… (elastic#142926)
  ESQL: Add support for ORC file format (elastic#142900)
  Update wolfi (versioned) (elastic#142948)
  Add BulkByScrollResponse Serialization Tests (elastic#142688)
  Run 25_id_generation with and without synthetic id (elastic#142770)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >docs General docs changes >non-issue Team:Distributed Meta label for distributed team. Team:Docs Meta label for docs team v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants