backport-2.0: storage: prevent unbounded raft log growth without quorum#27868
Merged
craig[bot] merged 6 commits intocockroachdb:release-2.0from Aug 6, 2018
Merged
backport-2.0: storage: prevent unbounded raft log growth without quorum#27868craig[bot] merged 6 commits intocockroachdb:release-2.0from
craig[bot] merged 6 commits intocockroachdb:release-2.0from
Conversation
Member
bdarnell
approved these changes
Jul 23, 2018
Contributor
Author
Contributor
|
Is this ready to merge? We want to get it in to 2.0.5 (which starts qualifying today). |
Contributor
Author
bdarnell
approved these changes
Aug 6, 2018
Contributor
bdarnell
left a comment
There was a problem hiding this comment.
Reviewed 3 of 5 files at r1, 3 of 3 files at r2, 3 of 5 files at r4, 2 of 2 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained
Fixes cockroachdb#27772. This change adds safeguards to prevent cases where a raft log would grow without bound during loss of quorum scenarios. It also adds a new test that demonstrates that the raft log does not grow without bound in these cases. There are two cases that need to be handled to prevent the unbounded raft log growth observed in cockroachdb#27772. 1. When the leader proposes a command and cannot establish a quorum. In this case, we know the leader has the entry in its log, so there's no need to refresh it with `reasonTicks`. To avoid this, we no longer use `refreshTicks` as a leader. 2. When a follower proposes a command that is forwarded to the leader who cannot establish a quorum. In this case, the follower can't be sure (currently) that the leader got the proposal, so it needs to refresh using `reasonTicks`. However, the leader now detects duplicate forwarded proposals and avoids appending redundant entries to its log. It does so by maintaining a set of in-flight forwarded proposals that it has received during its term as leader. This set is reset after every leadership change. Both of these cases are tested against in the new TestLogGrowthWhenRefreshingPendingCommands. Without both of the safeguards introduced in this commit, the test fails. Release note (bug fix): Prevent unbounded growth of the raft log caused by a loss of quorum.
Release note: None
Fixes cockroachdb#27878. The test was flaky because a NodeLiveness update was getting stuck when killing a majority of nodes in a Range. The fix was to tie NodeLiveness' update context to its stopper so that liveness updates are canceled when their node begins to shut down. This was a real issue that I suspect would occasionally make nodes hang on shutdown when their liveness range was becoming unavailable. There are probably other issues in the same class of bugs, but stressing `TestLogGrowthWhenRefreshingPendingCommands` isn't showing anything else. In doing so, the change needed to extend `stopper.WithCancel` into `stopper.WithCancelOnQuiesce` and `stopper.WithCancelOnStop`. Release note: None
Before this change, `WithCancelOnQuiesce` and `WithCancelOnStop` were dangerous to use because they would never clean up memory. This meant that any use of the methods that happened more than a constant number of times would slowly leak memory. This was an issue in `client.Txn.rollback`. This change fixes the methods so that it's possible for callers to clean up after themselves. Added a warning to `Stopper.AddCloser` because this had a similar issue. Release note: None
Fixes cockroachdb#27908. Release note: None
Before this change the Stopper made no guarantee that `Closers`
registered by the `AddCloser` method or `Contexts` tied to
the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly
cleaned up. In cases where these methods were called concurrently
with the Stopper stopping, it was possible for the resources to
leak. This change fixes this by handling cases where these methods
are called concurrently with or after the Stopper has stopped. This
allows them to provide stronger external guarantees.
Release note: None
Contributor
Author
|
bors r+ |
Contributor
Build failed |
Contributor
Author
|
Flake is #28228. bors r+ |
Contributor
Build failed |
Contributor
Author
|
Same flake. Retrying one more time before taking more drastic measures. bors r+ |
craig bot
pushed a commit
that referenced
this pull request
Aug 6, 2018
27868: backport-2.0: storage: prevent unbounded raft log growth without quorum r=nvanbenschoten a=nvanbenschoten Backport 2/2 commits from #27774. /cc @cockroachdb/release --- Fixes #27772. This change adds safeguards to prevent cases where a raft log would grow without bound during loss of quorum scenarios. It also adds a new test that demonstrates that the raft log does not grow without bound in these cases. There are two cases that need to be handled to prevent the unbounded raft log growth observed in #27772. 1. When the leader proposes a command and cannot establish a quorum. In this case, we know the leader has the entry in its log, so there's no need to refresh it with `reasonTicks`. To avoid this, we no longer use `refreshTicks` as a leader. 2. When a follower proposes a command that is forwarded to the leader who cannot establish a quorum. In this case, the follower can't be sure (currently) that the leader got the proposal, so it needs to refresh using `reasonTicks`. However, the leader now detects duplicate forwarded proposals and avoids appending redundant entries to its log. It does so by maintaining a set of in-flight forwarded proposals that it has received during its term as leader. This set is reset after every leadership change. Both of these cases are tested against in the new TestLogGrowthWhenRefreshingPendingCommands. Without both of the safeguards introduced in this commit, the test fails. Release note (bug fix): Prevent loss of quorum situations from allowing unbounded growth of a Range's Raft log. 28225: release-2.0: importccl: Preserve '\r\n' during CSV import r=dt a=dt Backport 1/1 commits from #28181. /cc @cockroachdb/release --- See #25344. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: neeral <neeral@users.noreply.github.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Contributor
Build succeeded |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport 2/2 commits from #27774.
/cc @cockroachdb/release
Fixes #27772.
This change adds safeguards to prevent cases where a raft log
would grow without bound during loss of quorum scenarios. It
also adds a new test that demonstrates that the raft log does
not grow without bound in these cases.
There are two cases that need to be handled to prevent the
unbounded raft log growth observed in #27772.
quorum. In this case, we know the leader has the entry in
its log, so there's no need to refresh it with
reasonTicks.To avoid this, we no longer use
refreshTicksas a leader.leader who cannot establish a quorum. In this case, the
follower can't be sure (currently) that the leader got the
proposal, so it needs to refresh using
reasonTicks. However,the leader now detects duplicate forwarded proposals and
avoids appending redundant entries to its log. It does so
by maintaining a set of in-flight forwarded proposals that
it has received during its term as leader. This set is reset
after every leadership change.
Both of these cases are tested against in the new
TestLogGrowthWhenRefreshingPendingCommands. Without both of
the safeguards introduced in this commit, the test fails.
Release note (bug fix): Prevent loss of quorum situations from
allowing unbounded growth of a Range's Raft log.