Skip to content

storage: migrate raft state on startup#7429

Merged
tbg merged 3 commits intocockroachdb:masterfrom
tbg:migrate-raft-state
Jun 24, 2016
Merged

storage: migrate raft state on startup#7429
tbg merged 3 commits intocockroachdb:masterfrom
tbg:migrate-raft-state

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Jun 23, 2016

In #7310, we added (and enforced) assumptions on what was
persisted to disk, but did not ensure that older versions
were migrated properly. This in turn lead to #7389.

A migration is only necessary for ranges which had never
been truncated, and so it should be fairly rare in practice.

The migration simply writes the truncated state that would
otherwise been synthesized by the old code, which is technically
a consistency violation, but should work well in practice because
all nodes will write that state on startup when they are rebooted.


This change is Reviewable

@mberhault
Copy link
Copy Markdown
Contributor

LGTM, but I'm not an expert

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jun 23, 2016

Yeah, we'll try this out and then hopefully @bdarnell can take a look.

@petermattis
Copy link
Copy Markdown
Collaborator

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


storage/store.go, line 908 [r1] (raw file):

          }
          log.Warning("migration: synthesized truncated state for %+v", desc)
      }

We need a test for this, though it doesn't need to be onerous. Seems feasible to pull out the above into a separate method and to construct a test which writes the old state and verifies that the migration updates the state as expected.


Comments from Reviewable

@tbg tbg force-pushed the migrate-raft-state branch from 86090be to 4985dcb Compare June 23, 2016 17:04
tbg added a commit to cockroachdb/postgres-test that referenced this pull request Jun 23, 2016
@tbg tbg force-pushed the migrate-raft-state branch from 4985dcb to 34b9892 Compare June 24, 2016 00:19
tbg added a commit to tbg/cockroach that referenced this pull request Jun 24, 2016
Also straightened out some kinks with the existing reference tests, namely that
they were creating two single-node clusters per test which were never used.
Instead, we can now create zero-node clusters and run one-shot containers on
it (one would argue that that's still weird, but it seems acceptable for the
time being).

The newly added reference test will crash when run for commits between cockroachdb#7310
and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the
test.
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jun 24, 2016

Updated, PTAL. Made it a Docker reference test. I initially tried to unit-test this, but it was really silly and not convincing at all.

tbg added a commit to cockroachdb/postgres-test that referenced this pull request Jun 24, 2016
@tbg tbg force-pushed the migrate-raft-state branch from 34b9892 to d0f2298 Compare June 24, 2016 00:44
tbg added a commit to tbg/cockroach that referenced this pull request Jun 24, 2016
Also straightened out some kinks with the existing reference tests, namely that
they were creating two single-node clusters per test which were never used.
Instead, we can now create zero-node clusters and run one-shot containers on
it (one would argue that that's still weird, but it seems acceptable for the
time being).

The newly added reference test will crash when run for commits between cockroachdb#7310
and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the
test.
@tamird
Copy link
Copy Markdown
Contributor

tamird commented Jun 24, 2016

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 4 of 4 files at r3.
Review status: 4 of 5 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.


acceptance/reference_test.go, line 34 [r3] (raw file):

  err := testDockerOneShot(t, "reference", []string{"/bin/bash", "-c", script})
  if err != nil {
      t.Errorf("expected success: %s", err)

heh why not just t.Error(err)


acceptance/replication_test.go, line 44 [r3] (raw file):

      // Looks silly, but we actually start zero-node clusters in the
      // reference tests.
      t.Log("replication test is a no-op for empty cluster")

t.Skip("replication test ...") instead of this return


acceptance/util_test.go, line 214 [r3] (raw file):

          for ; i < 100; i++ {
              _, _, fun := caller.Lookup(i)
              if testFuncRE.MatchString(fun) {

surely i'm misreading this?


acceptance/util_test.go, line 288 [r3] (raw file):

)

func testDocker(t *testing.T, num int, name string, cmd []string) error {

nit: just make this int32, you're dealing with constants here


acceptance/cluster/localcluster.go, line 228 [r3] (raw file):

      return err
  }
  err = l.oneshot.Wait()

reduce scope?


acceptance/cluster/localcluster.go, line 232 [r3] (raw file):

      var b bytes.Buffer
      l.oneshot.Logs(&b)
      return fmt.Errorf("%s:\n%s", err, b.String())

errors.Wrap? :trollface:


Comments from Reviewable

@tbg tbg force-pushed the migrate-raft-state branch from d0f2298 to 804807a Compare June 24, 2016 01:04
tbg added a commit to tbg/cockroach that referenced this pull request Jun 24, 2016
Also straightened out some kinks with the existing reference tests, namely that
they were creating two single-node clusters per test which were never used.
Instead, we can now create zero-node clusters and run one-shot containers on
it (one would argue that that's still weird, but it seems acceptable for the
time being).

The newly added reference test will crash when run for commits between cockroachdb#7310
and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the
test.
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jun 24, 2016

Review status: 2 of 5 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.


acceptance/reference_test.go, line 34 [r3] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

heh why not just t.Error(err)

Done.

acceptance/replication_test.go, line 44 [r3] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

t.Skip("replication test ...") instead of this return

Nope, I don't want to skip the tests that call this. Just not test this.

acceptance/util_test.go, line 214 [r3] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

surely i'm misreading this?

Erm. Fixed.

acceptance/util_test.go, line 288 [r3] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: just make this int32, you're dealing with constants here

Done.

acceptance/cluster/localcluster.go, line 228 [r3] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

reduce scope?

Done.

acceptance/cluster/localcluster.go, line 232 [r3] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

errors.Wrap? :trollface:

Done.

storage/store.go, line 908 [r1] (raw file):

Previously, petermattis (Peter Mattis) wrote…

We need a test for this, though it doesn't need to be onerous. Seems feasible to pull out the above into a separate method and to construct a test which writes the old state and verifies that the migration updates the state as expected.

Done.

Comments from Reviewable

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Jun 24, 2016

:lgtm:


Reviewed 1 of 1 files at r4, 3 of 3 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


acceptance/replication_test.go, line 44 [r3] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Nope, I don't want to skip the tests that call this. Just not test this.

Oh, I just assumed this was a test function.

acceptance/util_test.go, line 214 [r3] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Erm. Fixed.

i'd put the logic in here instead of continue and break but not worth another CI run

Comments from Reviewable

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Jun 24, 2016

Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


acceptance/util_test.go, line 214 [r3] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

i'd put the logic in here instead of continue and break but not worth another CI run

oh wait you need to bump the image tag, so you can make that change :)

Comments from Reviewable

tbg added a commit to cockroachdb/postgres-test that referenced this pull request Jun 24, 2016
@tbg tbg force-pushed the migrate-raft-state branch from 804807a to 1d1fd96 Compare June 24, 2016 01:10
tbg added a commit to tbg/cockroach that referenced this pull request Jun 24, 2016
Also straightened out some kinks with the existing reference tests, namely that
they were creating two single-node clusters per test which were never used.
Instead, we can now create zero-node clusters and run one-shot containers on
it (one would argue that that's still weird, but it seems acceptable for the
time being).

The newly added reference test will crash when run for commits between cockroachdb#7310
and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the
test.
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jun 24, 2016

Review status: 4 of 5 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


acceptance/util_test.go, line 214 [r3] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

oh wait you need to bump the image tag, so you can make that change :)

Done.

Comments from Reviewable

tbg added a commit to cockroachdb/postgres-test that referenced this pull request Jun 24, 2016
@tbg tbg force-pushed the migrate-raft-state branch from 1d1fd96 to 7b25668 Compare June 24, 2016 01:42
tbg added a commit to tbg/cockroach that referenced this pull request Jun 24, 2016
Also straightened out some kinks with the existing reference tests, namely that
they were creating two single-node clusters per test which were never used.
Instead, we can now create zero-node clusters and run one-shot containers on
it (one would argue that that's still weird, but it seems acceptable for the
time being).

The newly added reference test will crash when run for commits between cockroachdb#7310
and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the
test.
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jun 24, 2016

postgres-test version bumped, tested again locally, merging on green.

@tbg tbg force-pushed the migrate-raft-state branch from 7b25668 to cf41b18 Compare June 24, 2016 02:29
In cockroachdb#7310, we added (and enforced) assumptions on what was
persisted to disk, but did not ensure that older versions
were migrated properly. This in turn lead to cockroachdb#7389.

A migration is only necessary for ranges which had never
been truncated, and so it should be fairly rare in practice.

The migration simply writes the truncated state that would
otherwise been synthesized by the old code, which is technically
a consistency violation, but should work well in practice because
all nodes will write that state on startup when they are rebooted.
Also straightened out some kinks with the existing reference tests, namely that
they were creating two single-node clusters per test which were never used.
Instead, we can now create zero-node clusters and run one-shot containers on
it (one would argue that that's still weird, but it seems acceptable for the
time being).

The newly added reference test will crash when run for commits between cockroachdb#7310
and cockroachdb#7429. See cockroachdb/postgres-test#13 for the initial state used in the
test.
@tbg tbg force-pushed the migrate-raft-state branch from cf41b18 to d3d5184 Compare June 24, 2016 03:37
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jun 24, 2016

Green at last.

@tbg tbg merged commit 35d160e into cockroachdb:master Jun 24, 2016
@tbg tbg deleted the migrate-raft-state branch June 24, 2016 04:56
bdarnell added a commit to cockroachdb/postgres-test that referenced this pull request Apr 22, 2017
Remove test data for cockroachdb/cockroach#7429 because we no longer
support upgrading from that version.

See cockroachdb/cockroach#15228
bdarnell added a commit to cockroachdb/postgres-test that referenced this pull request Apr 22, 2017
Remove test data for cockroachdb/cockroach#7429 because we no longer
support upgrading from that version.

See cockroachdb/cockroach#15228
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants