Skip to content

Advance checkpoints only after persisting ops#9309

Merged
mergify[bot] merged 1 commit intomasterfrom
ma/port_PR43205
Nov 11, 2019
Merged

Advance checkpoints only after persisting ops#9309
mergify[bot] merged 1 commit intomasterfrom
ma/port_PR43205

Conversation

@marregui
Copy link
Copy Markdown
Contributor

Do-over

Checklist

  • User relevant changes are recorded in CHANGES.txt
  • Touched code is covered by tests
  • Documentation has been updated if necessary
  • CLA is signed
  • This does not contain breaking changes, or if it does:
    • It is released within a major release
    • It is recorded in CHANGES.txt
    • It was marked as deprecated in an earlier release if possible
    • You've thought about the consequences and other components are adapted
      (E.g. AdminUI)

@marregui marregui force-pushed the ma/port_PR43205 branch 3 times, most recently from 51bc1f1 to b858f63 Compare October 31, 2019 16:07
@marregui marregui requested a review from mfussenegger October 31, 2019 16:09
@marregui
Copy link
Copy Markdown
Contributor Author

retest this please

cannot reproduce locally

@marregui marregui requested a review from seut November 1, 2019 08:54
Copy link
Copy Markdown
Member

@mfussenegger mfussenegger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also port over LocalCheckpointTrackerTests

Comment on lines +1398 to +1404
public boolean exists() {
return exists;
}

public long version() {
return this.version;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove these extra properties again? See 836b1bb for why they were removed.

translog = openTranslog(engineConfig, translogDeletionPolicy, engineConfig.getGlobalCheckpointSupplier(),
seqNo -> {
final LocalCheckpointTracker tracker = getLocalCheckpointTracker();
assert tracker != null || getTranslog().isOpen() == false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some assertion message here?

SequenceNumbers.loadSeqNoInfoFromLuceneCommit(store.readLastCommittedSegmentsInfo().userData.entrySet());
maxSeqNo = seqNoStats.maxSeqNo;
localCheckpoint = seqNoStats.localCheckpoint;
logger.trace("recovered maximum sequence number [{}] and local checkpoint [{}]", maxSeqNo, localCheckpoint);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe wrap in logger.isTraceEnabled

localCheckpointTracker.markSeqNoAsProcessed(indexResult.getSeqNo());
if (indexResult.getTranslogLocation() == null) {
// the op is coming from the translog (and is hence persisted already) or it does not have a sequence number
assert index.origin().isFromTranslog() || indexResult.getSeqNo() == SequenceNumbers.UNASSIGNED_SEQ_NO;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, an assertion message would be nice

localCheckpointTracker.markSeqNoAsProcessed(deleteResult.getSeqNo());
if (deleteResult.getTranslogLocation() == null) {
// the op is coming from the translog (and is hence persisted already) or does not have a sequence number (version conflict)
assert delete.origin().isFromTranslog() || deleteResult.getSeqNo() == SequenceNumbers.UNASSIGNED_SEQ_NO;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here.

@marregui
Copy link
Copy Markdown
Contributor Author

marregui commented Nov 4, 2019

retest this please

I cannot reproduce locally....

@marregui
Copy link
Copy Markdown
Contributor Author

marregui commented Nov 4, 2019

retest this please

seems flaky, I am investigating locally

@marregui marregui requested a review from mfussenegger November 4, 2019 12:32
@marregui marregui added the ready-to-merge Let Mergify merge the PR once approved and checks pass label Nov 4, 2019
@marregui
Copy link
Copy Markdown
Contributor Author

marregui commented Nov 4, 2019

retest this please

it is flaky, and I cannot reproduce it locally. Taking the one from ES requires some changes, some code to import too

@mfussenegger mfussenegger removed the ready-to-merge Let Mergify merge the PR once approved and checks pass label Nov 4, 2019
@marregui marregui requested a review from mfussenegger November 8, 2019 20:21
Copy link
Copy Markdown
Member

@mfussenegger mfussenegger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise lgtm

Comment on lines +4461 to +4462
@Seed("EA3633FB2FE61702")
@Repeat(iterations = 1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should be reverted now

Comment on lines +5079 to +5080
@Seed("99FCB15FF5D32AD0")
@Repeat(iterations = 1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Port of elastic/elasticsearch#43205

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.
@marregui marregui added the ready-to-merge Let Mergify merge the PR once approved and checks pass label Nov 11, 2019
@mergify mergify bot merged commit dfed1ca into master Nov 11, 2019
@mergify mergify bot deleted the ma/port_PR43205 branch November 11, 2019 12:23
@marregui marregui self-assigned this Dec 4, 2019
@marregui marregui added the v/4.0 label Dec 5, 2019
kovrus added a commit that referenced this pull request Sep 25, 2020
…VerifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
kovrus added a commit that referenced this pull request Sep 25, 2020
…ifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
kovrus added a commit that referenced this pull request Sep 25, 2020
…ifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
kovrus added a commit that referenced this pull request Sep 26, 2020
…ifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
kovrus added a commit that referenced this pull request Sep 28, 2020
…ifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
mergify bot pushed a commit that referenced this pull request Sep 28, 2020
…ifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Let Mergify merge the PR once approved and checks pass v/4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants