storage: migrate raft state on startup#7429
Conversation
|
LGTM, but I'm not an expert |
|
Yeah, we'll try this out and then hopefully @bdarnell can take a look. |
|
Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. storage/store.go, line 908 [r1] (raw file):
We need a test for this, though it doesn't need to be onerous. Seems feasible to pull out the above into a separate method and to construct a test which writes the old state and verifies that the migration updates the state as expected. Comments from Reviewable |
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
|
Updated, PTAL. Made it a Docker reference test. I initially tried to unit-test this, but it was really silly and not convincing at all. |
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
|
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 4 of 4 files at r3. acceptance/reference_test.go, line 34 [r3] (raw file):
heh why not just t.Error(err) acceptance/replication_test.go, line 44 [r3] (raw file):
t.Skip("replication test ...") instead of this return acceptance/util_test.go, line 214 [r3] (raw file):
surely i'm misreading this? acceptance/util_test.go, line 288 [r3] (raw file):
nit: just make this int32, you're dealing with constants here acceptance/cluster/localcluster.go, line 228 [r3] (raw file):
reduce scope? acceptance/cluster/localcluster.go, line 232 [r3] (raw file):
errors.Wrap? Comments from Reviewable |
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
|
Review status: 2 of 5 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending. acceptance/reference_test.go, line 34 [r3] (raw file):
|
|
Reviewed 1 of 1 files at r4, 3 of 3 files at r5. acceptance/replication_test.go, line 44 [r3] (raw file):
|
|
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks pending. acceptance/util_test.go, line 214 [r3] (raw file):
|
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
|
Review status: 4 of 5 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending. acceptance/util_test.go, line 214 [r3] (raw file):
|
Add an initial state for testing cockroachdb/cockroach#7429
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
|
|
In cockroachdb#7310, we added (and enforced) assumptions on what was persisted to disk, but did not ensure that older versions were migrated properly. This in turn lead to cockroachdb#7389. A migration is only necessary for ranges which had never been truncated, and so it should be fairly rare in practice. The migration simply writes the truncated state that would otherwise been synthesized by the old code, which is technically a consistency violation, but should work well in practice because all nodes will write that state on startup when they are rebooted.
Also straightened out some kinks with the existing reference tests, namely that they were creating two single-node clusters per test which were never used. Instead, we can now create zero-node clusters and run one-shot containers on it (one would argue that that's still weird, but it seems acceptable for the time being). The newly added reference test will crash when run for commits between cockroachdb#7310 and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the test.
|
Green at last. |
Remove test data for cockroachdb/cockroach#7429 because we no longer support upgrading from that version. See cockroachdb/cockroach#15228
Remove test data for cockroachdb/cockroach#7429 because we no longer support upgrading from that version. See cockroachdb/cockroach#15228
In #7310, we added (and enforced) assumptions on what was
persisted to disk, but did not ensure that older versions
were migrated properly. This in turn lead to #7389.
A migration is only necessary for ranges which had never
been truncated, and so it should be fairly rare in practice.
The migration simply writes the truncated state that would
otherwise been synthesized by the old code, which is technically
a consistency violation, but should work well in practice because
all nodes will write that state on startup when they are rebooted.
This change is