Skip to content

Follow engine should not fill gaps upon promotion and recovery#31751

Merged
martijnvg merged 2 commits intoelastic:ccrfrom
martijnvg:follow_engine_should_not_fill_history_gaps
Jul 3, 2018
Merged

Follow engine should not fill gaps upon promotion and recovery#31751
martijnvg merged 2 commits intoelastic:ccrfrom
martijnvg:follow_engine_should_not_fill_history_gaps

Conversation

@martijnvg
Copy link
Copy Markdown
Member

PR for #31318

@martijnvg martijnvg added review :Distributed/CCR Issues around the Cross Cluster State Replication features labels Jul 3, 2018
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed

@martijnvg martijnvg requested a review from bleskes July 3, 2018 06:08
Copy link
Copy Markdown
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ActionListener<Releasable> actionListener = ActionListener.wrap(releasable -> {
releasable.close();
latch.countDown();
}, e -> {throw new RuntimeException(e);});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assertion error?

@martijnvg martijnvg merged commit ac654cb into elastic:ccr Jul 3, 2018
dnhatn added a commit that referenced this pull request Oct 10, 2018
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.

This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).

Relates #31751
Relates #31113
dnhatn added a commit that referenced this pull request Oct 11, 2018
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.

This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).

Relates #31751
Relates #31113
kcm pushed a commit that referenced this pull request Oct 30, 2018
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.

This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).

Relates #31751
Relates #31113
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/CCR Issues around the Cross Cluster State Replication features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants