-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Wait on shard failures #14252
Copy link
Copy link
Closed
Labels
:Distributed/DistributedA catch all label for anything in the Distributed Area. Please avoid if you can.A catch all label for anything in the Distributed Area. Please avoid if you can.>enhancementMetarelease highlightresiliencyv5.0.0-alpha1
Metadata
Metadata
Assignees
Labels
:Distributed/DistributedA catch all label for anything in the Distributed Area. Please avoid if you can.A catch all label for anything in the Distributed Area. Please avoid if you can.>enhancementMetarelease highlightresiliencyv5.0.0-alpha1
Type
Fields
Give feedbackNo fields configured for issues without a type.
Currently when executing an action (e.g., bulk, delete, or indexing operations) on all shards, if an exception occurs while executing the action on a replica shard we send a shard failure message to the master. However, we do not wait for the master to acknowledge this message and do not handle failures in sending this message to the master. This is problematic because it means that we will acknowledge the action and this can result in losing writes. For example, in a situation where a primary is isolated from the master and its replicas, the following sequence of events can occur:
In this case, the replica will not have the write that was acknowledged to the client and this amounts to data loss.
Instead, if we waited on the master to acknowledge the shard failures we would never have acknowledged the write to the client in this case.