Use follower primary term when applying operations#31113
Use follower primary term when applying operations#31113jasontedor merged 5 commits intoelastic:ccrfrom
Conversation
The primary shard copy on the following has authority of the replication operations that occur on the following side in cross-cluster replication. Yet today we are using the primary term directly from the operations on the leader side. Instead we should be replacing the primary term on the following side with the primary term of the primary on the following side. This commit does this by copying the translog operations with the corrected primary term. This ensures that we use this primary term while applying the operations on the primary, and when replicating them across to the replica (where the replica request was carrying the primary term of the primary shard copy on the follower).
|
Pinging @elastic/es-distributed |
|
run gradle build tests |
* ccr: [DOCS] Creates rest-api folder in docs [Rollup] Disallow index patterns that match the rollup index (elastic#30491) Replace exact numDocs by soft-del count in SegmentInfo (elastic#31086) Upgrade to Lucene-7.4.0-snapshot-0a7c3f462f (elastic#31073) Add cors support to NioHttpServerTransport (elastic#30827) [DOCS] Fixes security example (elastic#31082) Allow terms query in _rollup_search (elastic#30973) Removing erroneous repeat Adapt bwc versions after backporting elastic#30983 to 6.4 [Tests] Muting RatedRequestsTests#testXContentParsingIsNotLenient TEST: Retry synced-flush if ongoing ops on primary (elastic#30978) Fix docs build. Only auto-update license signature if all nodes ready (elastic#30859) Add BlobContainer.writeBlobAtomic() (elastic#30902) Add a doc value format to binary fields. (elastic#30860)
bleskes
left a comment
There was a problem hiding this comment.
LGTM. Left some minor comments
| * another shard) | ||
| */ | ||
| protected IndexShard newShard(boolean primary) throws IOException { | ||
| ShardRouting shardRouting = TestShardRouting.newShardRouting(new ShardId("index", "_na_", 0), randomAlphaOfLength(10), primary, |
There was a problem hiding this comment.
call newShard(primary, Settings.EMPTY, new InternalEngineFactory()?
| primary.getPrimaryTerm(), | ||
| index.version(), | ||
| index.versionType(), | ||
| index.source().toBytesRef().bytes, |
There was a problem hiding this comment.
I think this is dangerous - it's correct now since the way we stream guarantee that the bytes ref is offseted with 0 and has a length equal to the bytes array but this isn't guaranteed by the API. I think we should honor the API or, if this turns out difficult assert at the very least.
| final IndexShard followerPrimary = newStartedShard(true, settings, new FollowingEngineFactory()); | ||
|
|
||
| // we use this primary on the operations yet we expect the applied operations to have the primary term of the follower | ||
| final long primaryTerm = randomLongBetween(1, followerPrimary.getPrimaryTerm()); |
There was a problem hiding this comment.
nit: we can do this fully random - there's no relationship between the primary term of the leader and the follower
There was a problem hiding this comment.
The randomness was not meant to imply there was such a relationship. 😇
I pushed 0a8499b.
| } | ||
| } | ||
|
|
||
| for (final Translog.Operation operation : result.replicaRequest().getOperations()) { |
|
run gradle build tests |
The primary shard copy on the following has authority of the replication operations that occur on the following side in cross-cluster replication. Yet today we are using the primary term directly from the operations on the leader side. Instead we should be replacing the primary term on the following side with the primary term of the primary on the following side. This commit does this by copying the translog operations with the corrected primary term. This ensures that we use this primary term while applying the operations on the primary, and when replicating them across to the replica (where the replica request was carrying the primary term of the primary shard copy on the follower).
Today we rewrite the operations from the leader with the term of the following primary because the follower should own its history. The problem is that a newly promoted primary may re-assign its term to operations which were replicated to replicas before by the previous primary. If this happens, some operations with the same seq_no may be assigned different terms. This is not good for the future optimistic locking using a combination of seqno and term. This change ensures that the primary of a follower only processes an operation if that operation was not processed before. The skipped operations are guaranteed to be delivered to replicas via either primary-replica resync or peer-recovery. However, the primary must not acknowledge until the global checkpoint is at least the highest seqno of all skipped ops (i.e., they all have been processed on every replica). Relates #31751 Relates #31113
Today we rewrite the operations from the leader with the term of the following primary because the follower should own its history. The problem is that a newly promoted primary may re-assign its term to operations which were replicated to replicas before by the previous primary. If this happens, some operations with the same seq_no may be assigned different terms. This is not good for the future optimistic locking using a combination of seqno and term. This change ensures that the primary of a follower only processes an operation if that operation was not processed before. The skipped operations are guaranteed to be delivered to replicas via either primary-replica resync or peer-recovery. However, the primary must not acknowledge until the global checkpoint is at least the highest seqno of all skipped ops (i.e., they all have been processed on every replica). Relates #31751 Relates #31113
Today we rewrite the operations from the leader with the term of the following primary because the follower should own its history. The problem is that a newly promoted primary may re-assign its term to operations which were replicated to replicas before by the previous primary. If this happens, some operations with the same seq_no may be assigned different terms. This is not good for the future optimistic locking using a combination of seqno and term. This change ensures that the primary of a follower only processes an operation if that operation was not processed before. The skipped operations are guaranteed to be delivered to replicas via either primary-replica resync or peer-recovery. However, the primary must not acknowledge until the global checkpoint is at least the highest seqno of all skipped ops (i.e., they all have been processed on every replica). Relates #31751 Relates #31113
The primary shard copy on the following has authority of the replication operations that occur on the following side in cross-cluster replication. Yet today we are using the primary term directly from the operations on the leader side. Instead we should be replacing the primary term on the following side with the primary term of the primary on the following side. This commit does this by copying the translog operations with the corrected primary term. This ensures that we use this primary term while applying the operations on the primary, and when replicating them across to the replica (where the replica request was carrying the primary term of the primary shard copy on the follower).