Get test working#1
Conversation
There was a problem hiding this comment.
I changed the test to use TransportClusterHealthAction as the redirected request instead of ClusterStateAction because InternalTestCluster#getMasterName() (used by masterClient etc.) triggers a ClusterStateAction, which was making the received-at-most-once assertions invalid.
c5e4b9b to
ce14d2e
Compare
...nalClusterTest/java/org/elasticsearch/action/support/master/TransportMasterNodeActionIT.java
Show resolved
Hide resolved
| } | ||
|
|
||
| @TestLogging(reason = "wip", value = "org.elasticsearch.transport.TransportService.tracer:TRACE") | ||
| @TestLogging(reason = "wip", value = "org.elasticsearch.action.support.master.TransportMasterNodeAction:DEBUG") |
There was a problem hiding this comment.
Left this in so you can see
[2024-06-19T07:58:56,657][DEBUG][o.e.a.s.m.TransportMasterNodeAction] [node_s5] request routed to master in term [2] but local term is [1], waiting for local term bump
...nalClusterTest/java/org/elasticsearch/action/support/master/TransportMasterNodeActionIT.java
Outdated
Show resolved
Hide resolved
| } finally { | ||
| newMasterClusterService.removeApplier(clusterFormationMonitor); | ||
| } | ||
| } |
There was a problem hiding this comment.
This feels a bit hacky, but I struggled to see how we could
- Block the applier on newMaster to prevent it clearing the current master (in
becomeCandidate) - Allow newMaster to send a joinRequest to itself via the cluster applier (in
JoinHelper) - Block the applier on newMaster to prevent it applying the cluster state after it wins an election
I think (although with all the asynchrony, I'm not sure) that the above things happen in the above order, so I don't know how we could block 1 and 3 whilst allowing 2 to go through - and if we could, would the extra machinery just make the test harder to read and more brittle?
That means we can't rely on our own join, and because the old master sometimes(?) accepts its failed update, we can't rely on its join, so we can't reliably get a quorum in a 3 node cluster. If we bump the cluster size up to 5 nodes we seem to succeed every time.
| safeAwait(electionWonLatch); | ||
| // Wait until the old master has acknowledged the new master's election | ||
| safeAwait(previousMasterKnowsNewMasterIsElectedLatch); | ||
| logger.info("New master is elected"); |
There was a problem hiding this comment.
because we're preventing cluster state updates being applied on newMaster, we watch the previous master's cluster state to determine when we've won the election. We need to wait for the previous master to apply the term bump anyway before we send the to-be-redirected request so this serves multiple purposes.
fc88ff4
into
DaveCTurner:2024/05/22/versioned-master-node-requests
|
I'm merging this into the PR branch so I can iterate on it a bit more |
…sion (#1…" (elastic#115827) This reverts commit 32dee6a.
…uginFuncTest builds distribution from branches via archives extractedAssemble [bwcDistVersion: 8.3.0, bwcProject: staged, expectedAssembleTaskName: extractedAssemble, #1] elastic#119870
…uginFuncTest builds distribution from branches via archives extractedAssemble [bwcDistVersion: 8.3.0, bwcProject: staged, expectedAssembleTaskName: extractedAssemble, #1] elastic#119870
…uginFuncTest builds distribution from branches via archives extractedAssemble [bwcDistVersion: 8.3.0, bwcProject: staged, expectedAssembleTaskName: extractedAssemble, #1] elastic#119870
Initial work on getting the test working
This test now appears to reliably reproduce the issue that was fixed. I verified it by removing the
request.masterTermcheck fromTransportMasterNodeAction.Proposal/Query
I know I've missed something, but would it not be less fragile and maybe easier to reason about if instead of forwarding
MasterNodeRequests on to the master recursively, we recorded how many hops we are from the coordinating node and throw aStaleRoutingStateExceptionif we get to hop 1 and the receiver not the master? (i.e. if we were forwarded aMasterNodeRequestdespite us not being the master).The
StaleRoutingStateExceptioncould include the routed-to node's current term so that the forwarding node could delay until it reached that term before attempting to forward again (or just apply some reasonable delay if the receiver is lagging).With this approach we could put the coordinating node solely in charge of retrying the request when it fails due to a
StaleRoutingStateExceptionwhich I think would fix the duplicate tasks issue resulting from a deeply re-routed request (which I think can still occur when a request is legitimately chasing a master around the cluster?).