Skip to content

release-2.1: storage: take an engine checkpoint during failing consistency checks#42042

Merged
tbg merged 1 commit intocockroachdb:release-2.1from
irfansharif:backport2.1-36867
Oct 31, 2019
Merged

release-2.1: storage: take an engine checkpoint during failing consistency checks#42042
tbg merged 1 commit intocockroachdb:release-2.1from
irfansharif:backport2.1-36867

Conversation

@irfansharif
Copy link
Copy Markdown
Contributor

Backport 1/2 commits from #36867.

Useful as part of #42011.

+cc @cockroachdb/release


This takes a checkpoint on the nodes with replicas of a failing range,
before the failure leads to nodes shutting down. The checkpoint will, for
the replicas of the affected range, be taken at the same Raft log position.

Release note: None

@irfansharif irfansharif requested review from a team and tbg October 30, 2019 19:47
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@irfansharif irfansharif removed the request for review from a team October 30, 2019 19:47
@tbg
Copy link
Copy Markdown
Member

tbg commented Oct 30, 2019

Clean backport? Wonderful!

We've repeatedly wanted this to preserve state when finding replica
inconsistencies.

See, for example:

cockroachdb#36861

Release note: None
@irfansharif
Copy link
Copy Markdown
Contributor Author

Failed with #31778, fix in #32899 was not backported. Retrying.

@irfansharif
Copy link
Copy Markdown
Contributor Author

Failing with Example-ORM failures with what looks like network flakes?

[TestDjango/FirstRun] main_test.go:163: Get http://localhost:6543/ping/: dial tcp 127.0.0.1:6543: getsockopt: connection refused

If safe to ignore, how do I get past the failing TC check here?

@irfansharif
Copy link
Copy Markdown
Contributor Author

irfansharif commented Oct 31, 2019

Ok, I see the same example-orms failures on release-v2.1 (was a major PITA to get this running, so much is broken). Very surprisingly the example-orms test suite isn't a pinned version, the changes made on example-orms master are reflected in test suites run against all branches of CRDB thereonforth. This is almost definitely not what we want.

@tbg: It's safe to disable branch protection for this PR, I don't have access to do so.

+cc @rohany, @rafiss: all future backports to release-v2.1 (and likely others) are going to fail. I don't know if this is something we care about, but TC will fail+block for future backports.

@tbg
Copy link
Copy Markdown
Member

tbg commented Oct 31, 2019

Merging at your request :shipit:

@tbg tbg merged commit c114859 into cockroachdb:release-2.1 Oct 31, 2019
@irfansharif irfansharif deleted the backport2.1-36867 branch November 18, 2019 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants