Skip to content

Use follower primary term when applying operations#31113

Merged
jasontedor merged 5 commits intoelastic:ccrfrom
jasontedor:follower-primary-term
Jun 6, 2018
Merged

Use follower primary term when applying operations#31113
jasontedor merged 5 commits intoelastic:ccrfrom
jasontedor:follower-primary-term

Conversation

@jasontedor
Copy link
Copy Markdown
Member

The primary shard copy on the following has authority of the replication operations that occur on the following side in cross-cluster replication. Yet today we are using the primary term directly from the operations on the leader side. Instead we should be replacing the primary term on the following side with the primary term of the primary on the following side. This commit does this by copying the translog operations with the corrected primary term. This ensures that we use this primary term while applying the operations on the primary, and when replicating them across to the replica (where the replica request was carrying the primary term of the primary shard copy on the follower).

The primary shard copy on the following has authority of the replication
operations that occur on the following side in cross-cluster
replication. Yet today we are using the primary term directly from the
operations on the leader side. Instead we should be replacing the
primary term on the following side with the primary term of the primary
on the following side. This commit does this by copying the translog
operations with the corrected primary term. This ensures that we use
this primary term while applying the operations on the primary, and when
replicating them across to the replica (where the replica request was
carrying the primary term of the primary shard copy on the follower).
@jasontedor jasontedor added review :Distributed/CCR Issues around the Cross Cluster State Replication features labels Jun 5, 2018
@jasontedor jasontedor requested review from bleskes and martijnvg June 5, 2018 18:55
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed

@jasontedor
Copy link
Copy Markdown
Member Author

run gradle build tests

* ccr:
  [DOCS] Creates rest-api folder in docs
  [Rollup] Disallow index patterns that match the rollup index (elastic#30491)
  Replace exact numDocs by soft-del count in SegmentInfo (elastic#31086)
  Upgrade to Lucene-7.4.0-snapshot-0a7c3f462f (elastic#31073)
  Add cors support to NioHttpServerTransport (elastic#30827)
  [DOCS] Fixes security example (elastic#31082)
  Allow terms query in _rollup_search (elastic#30973)
  Removing erroneous repeat
  Adapt bwc versions after backporting elastic#30983 to 6.4
  [Tests] Muting RatedRequestsTests#testXContentParsingIsNotLenient
  TEST:  Retry synced-flush if ongoing ops on primary (elastic#30978)
  Fix docs build.
  Only auto-update license signature if all nodes ready (elastic#30859)
  Add BlobContainer.writeBlobAtomic() (elastic#30902)
  Add a doc value format to binary fields. (elastic#30860)
Copy link
Copy Markdown
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some minor comments

* another shard)
*/
protected IndexShard newShard(boolean primary) throws IOException {
ShardRouting shardRouting = TestShardRouting.newShardRouting(new ShardId("index", "_na_", 0), randomAlphaOfLength(10), primary,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call newShard(primary, Settings.EMPTY, new InternalEngineFactory()?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 8086d49.

primary.getPrimaryTerm(),
index.version(),
index.versionType(),
index.source().toBytesRef().bytes,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is dangerous - it's correct now since the way we stream guarantee that the bytes ref is offseted with 0 and has a length equal to the bytes array but this isn't guaranteed by the API. I think we should honor the API or, if this turns out difficult assert at the very least.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 071a4a3.

final IndexShard followerPrimary = newStartedShard(true, settings, new FollowingEngineFactory());

// we use this primary on the operations yet we expect the applied operations to have the primary term of the follower
final long primaryTerm = randomLongBetween(1, followerPrimary.getPrimaryTerm());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can do this fully random - there's no relationship between the primary term of the leader and the follower

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The randomness was not meant to imply there was such a relationship. 😇

I pushed 0a8499b.

}
}

for (final Translog.Operation operation : result.replicaRequest().getOperations()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@jasontedor
Copy link
Copy Markdown
Member Author

run gradle build tests

@jasontedor jasontedor merged commit bf1152f into elastic:ccr Jun 6, 2018
jasontedor added a commit that referenced this pull request Jun 6, 2018
The primary shard copy on the following has authority of the replication
operations that occur on the following side in cross-cluster
replication. Yet today we are using the primary term directly from the
operations on the leader side. Instead we should be replacing the
primary term on the following side with the primary term of the primary
on the following side. This commit does this by copying the translog
operations with the corrected primary term. This ensures that we use
this primary term while applying the operations on the primary, and when
replicating them across to the replica (where the replica request was
carrying the primary term of the primary shard copy on the follower).
@jasontedor jasontedor deleted the follower-primary-term branch June 6, 2018 15:13
dnhatn added a commit that referenced this pull request Oct 10, 2018
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.

This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).

Relates #31751
Relates #31113
dnhatn added a commit that referenced this pull request Oct 11, 2018
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.

This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).

Relates #31751
Relates #31113
kcm pushed a commit that referenced this pull request Oct 30, 2018
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.

This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).

Relates #31751
Relates #31113
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/CCR Issues around the Cross Cluster State Replication features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants