Skip to content

storage: Coalesced Heartbeats Corner Cases #315

@tbg

Description

@tbg

from the discussion in #280, via @bdarnell:

I think there are some scenarios in which these limited-information heartbeats
could cause problems. For example, consider a group containing nodes A, B, and C:

A is initially the leader.
Node C loses connectivity to the other nodes and after a timeout declares itself a candidate.
The partition partially heals; C can talk to A but not B.
C is elected the new leader; A steps down.
B sees that A has been up continuously and so it still considers A the leader.

We might be able to solve this on a case-by-case basis (e.g. I think two-round elections would
help in the above scenario), but to be safe we should probably add a payload to the coalesced
heartbeat messages to indicate which groups the sending node considers itself to be leader of.

This thread on raft-dev discusses some related issues:
https://groups.google.com/forum/?fromgroups#!topic/raft-dev/VgF47vIsezg.
The term and log index is occasionally useful (but not, I think, strictly required.
Etcd is not currently sending this information with their heartbeats). 

Metadata

Metadata

Assignees

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions