Prevent TransportReplicationAction to route request based on stale local routing table#16274
Conversation
|
@bleskes instead of using the cluster state version, we could as well use the index metadata version. The index metadata version is updated whenever a new shard is started (thanks to active allocation ids). wdyt? On a related note, we could use this field as well to wait for dynamic mapping updates to be applied. (for that the update mappings api would have to return the current index metadata version). |
There was a problem hiding this comment.
can we assert that the version is not set?
There was a problem hiding this comment.
Got confused. It actually likely the previous routing node has an older cluster state in which we should override the existing value.. nevermind
|
@bleskes renamed the field and removed integration test. |
There was a problem hiding this comment.
can we add some java docs here?
|
LGTM . Thanks @ywelsch - Left some minor comments, no need for another cycle. |
397b6dc to
96be074
Compare
…cal routing table Closes elastic#16274 Closes elastic#12573 Closes elastic#12574
96be074 to
af1f637
Compare
Prevent TransportReplicationAction to route request based on stale local routing table
…cal routing table Closes elastic#16274 Closes elastic#12573 Closes elastic#12574
…cal routing table (#19296) When relocating a primary shard, there is a cluster state update at the end of relocation where the active primary is switched from the relocation source to the relocation target. If relocation source receives and processes this cluster state before the relocation target, there is a time span where relocation source believes active primary to be on relocation target and relocation target believes active primary to be on relocation source. This results in index/delete/flush requests being sent back and forth and can end in an OOM on both nodes. Backport of #16274 to 2.4.0.
Relates to #12573
When relocating a primary shard, there is a cluster state update at the end of relocation where the active primary is switched from the relocation source to the relocation target. If relocation source receives and processes this cluster state before the relocation target, there is a time span where relocation source believes active primary to be on relocation target and relocation target believes active primary to be on relocation source. This results in index/delete/flush requests being sent back and forth and can end in an OOM on both nodes.
This PR adds a field to the index/delete/flush request that helps detect the case where we locally have stale routing information. In case this staleness is detected, we wait until we have received an up-to-date cluster state before rerouting the request.
I have included the test from #12574 in this PR to demonstrate the fix in an integration test. That integration test will not be part of the final commit, however.