Skip to content

[WIP] cluster: implement version migration RFC#17155

Closed
spencerkimball wants to merge 6 commits intomasterfrom
version-migrations
Closed

[WIP] cluster: implement version migration RFC#17155
spencerkimball wants to merge 6 commits intomasterfrom
version-migrations

Conversation

@spencerkimball
Copy link
Copy Markdown
Member

Introduce Version and ClusterVersion protobuf messages.

Add default ServerVersion and MinimumSupportedVersion to the base
package.

Add read/write methods to the storage.Stores object for persisting
the gossiped ClusterVersion setting to prevent too-old or too-new
binaries from starting on the data in a configured store.

TODO:

  • Decide whether to build a callback facility for settings package.
  • Check gossiped ClusterVersion against setting either on new
    settings callback or every time that the system config changes.
  • Provide a hook when setting the cluster version setting to verify
    that existing nodes can accommodate the new value.

@spencerkimball spencerkimball requested a review from tbg July 20, 2017 21:25
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@spencerkimball spencerkimball force-pushed the version-migrations branch 2 times, most recently from 84eb1b4 to 7f701a0 Compare July 20, 2017 22:23
@tbg
Copy link
Copy Markdown
Member

tbg commented Jul 21, 2017

Looks good so far! For the env var, maybe you can poach something from https://github.com/cockroachdb/cockroach/compare/master...tschottdorf:migration?expand=1#diff-4ec93c4ea7116a6504b783da77089a7aR26.


Reviewed 20 of 20 files at r1.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/storage/stores.go, line 348 at r1 (raw file):

// ReadClusterVersion reads and returns the ClusterVersion protobuf
// written to any of the configured stores which has the highest
// minimum version. The returned value is also replicated to all

Mention what happens if that isn't unique. You could have same min version, but different use versions.


pkg/storage/stores_test.go, line 348 at r1 (raw file):

	ls3.AddStore(stores[0])
	verifyBI.Reset()
	if err := ls3.ReadBootstrapInfo(&verifyBI); err != nil {

Why this change?


pkg/storage/stores_test.go, line 486 at r1 (raw file):

	var err error
	if _, err = ls2.ReadClusterVersion(ctx); !testutils.IsError(err, "which is < minimum supported version") {
		t.Errorf("expected error reading version older than minimum supported")

Print the error, too.


pkg/storage/stores_test.go, line 512 at r1 (raw file):

	var err error
	if _, err = ls2.ReadClusterVersion(ctx); !testutils.IsError(err, "which is > supported version") {
		t.Errorf("expected error reading required version later than supported")

Ditto.


Comments from Reviewable

@spencerkimball
Copy link
Copy Markdown
Member Author

As mentioned on Slack, I was hoping you could take this PR over. Looks like this code is going to be useful for the SET CLUSTER SETTING command. Are you planning to also make this settable via the CLI?


Review status: 19 of 20 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


pkg/storage/stores.go, line 348 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Mention what happens if that isn't unique. You could have same min version, but different use versions.

Before updating this documentation, we should figure out what the most coherent policy is when the value of UseVesion is different between stores.


pkg/storage/stores_test.go, line 348 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Why this change?

There was a bug in the test; it wasn't testing what it was supposed to be testing...from a fresh Stores object.


pkg/storage/stores_test.go, line 486 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Print the error, too.

Done.


pkg/storage/stores_test.go, line 512 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Ditto.

Done.


Comments from Reviewable

@RaduBerinde
Copy link
Copy Markdown
Member

Just a note that cluster settings can already have change callbacks (OnChange).

@bdarnell
Copy link
Copy Markdown
Contributor

Reviewed 19 of 20 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/storage/stores.go, line 348 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

Before updating this documentation, we should figure out what the most coherent policy is when the value of UseVesion is different between stores.

I think there's going to be some gossiped value stored in the system config span that is the authoritative version of this. Whenever we get a gossip update we should update all our stores to match. If the stores disagree on use_version I don't think it matters much which one we use while waiting for our first gossip version, but I think I'd go with the smallest use_version value.


Comments from Reviewable

tbg added a commit to tbg/cockroach that referenced this pull request Jul 25, 2017
Implement (hopefully) more than 50% of the migration RFC[1]. Specifically,

- add a StatemachineSetting to the `settings` package
- use it for `SET CLUSTER SETTING version = '<major>.minor'`
- pass a newly created `*ExposedClusterVersion` around
- use `(*ExposedClusterVersion).IsActive(targetVersion)` to toggle existing
features that so far used the stub `util.IsMigrated()`
- remove `util.IsMigrated()`.

The missing part is incorporating the remaining parts of [2], that is, storing
the seen version to disk.

[1]: cockroachdb#16977
[2]: cockroachdb#17155
tbg added a commit to tbg/cockroach that referenced this pull request Jul 31, 2017
Implement (hopefully) more than 50% of the migration RFC[1]. Specifically,

- add a StateMachineSetting to the `settings` package
- use it for `SET CLUSTER SETTING version = '<major>.<minor>-<unstable>'`
- pass a newly created `*ExposedClusterVersion` around
- use `(*ExposedClusterVersion).IsActive(targetVersion)` to toggle existing
features that so far used the stub `util.IsMigrated()`
- remove `util.IsMigrated()`.

The missing part is incorporating the remaining parts of [2], that is, storing
the seen version to disk.

[1]: cockroachdb#16977
[2]: cockroachdb#17155
spencerkimball and others added 6 commits August 1, 2017 20:07
Add read/write methods to the `storage.Stores` object for persisting
the gossiped `ClusterVersion` setting to prevent too-old or too-new
binaries from starting on the data in a configured store.
@tbg tbg force-pushed the version-migrations branch from 002ccc8 to f770498 Compare August 2, 2017 18:42
@spencerkimball spencerkimball requested a review from a team August 2, 2017 18:42
@spencerkimball spencerkimball requested a review from a team as a code owner August 2, 2017 18:42
@tbg
Copy link
Copy Markdown
Member

tbg commented Aug 2, 2017

Superseded by upcoming PR.

@tbg tbg closed this Aug 2, 2017
tbg added a commit to tbg/cockroach that referenced this pull request Aug 2, 2017
Out for early review. The code contains numerous hacks that are listed below.
Making this work required a bit of refactoring (and it may not work yet; no
real testing has been done). In particular, I'm looking to validate that this
code will *actually* serve our migration needs for this upcoming release, but
also the ones after. Your feedback is especially important since I am obviously
in a rush to finish this up, and this invites poor quality.

Review with combined commits, there isn't much use to looking at the first
one in isolation; there's a lot of churn.

----

Pick up cockroachdb#17155, which introduced a basic mechanism to persist the cluster
version on the nodes' stores. To do this properly, we address the following:

- refactor the node startup sequence so that the (start-up) cluster version is
  known before any moving parts (stores, sql server, ...) are started.
- backfill the cluster version into all stores that don't have it.
- in particular, make sure bootstrapping new stores into an existing node does
  not race with this backfill (otherwise, switching out the stores one by one
  could in the worst case completely lose the persisted version).
- use the determined cluster version to initialize the
  `*ExposedClusterVersion`. This marriage is presently not a happy one since
  the singleton cluster settings want to be registered at init time, but we
  don't know the value then. It's unfortunate that the singleton will require
  significant refactoring to run mixed-cluster in-memory tests, but this is
  a constraint we'll have to live with, at least for the initial PR.
- persist a cluster version during cluster bootstrap. Thus we'll be able to
  fail the boot sequence in v1.2 when we see a bootstrapped store without
  a persisted entry.
- Allow callback registration to get notified on version changes.

TODO:

- [ ] Testing. Unit tests + an acceptance test will likely have to do.
- [ ] clean up the most egrerious hack: the StateMachineSetting initialization.
- [ ] clean up the methods and comments in `(*Stores).*ClusterVersion.*`.
- [ ] Update the RFC.
- [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required.
- [ ] Migration that makes sure there's always a cluster version settings able
entry (@m-schneider).
- [ ] Disallow version bump if settings table has no value entry.
- [ ] Consider removing the reliance on gossip callback ordering. Gossip does
      not *guarantee* that version updates arrive in-order, and we currently
      persist the cluster version blindly. `*ExposedClusterVersion` would
      have to do its own synthesis to refuse illegal changes backwards. This
      may be a non-issue but needs some consideration (all of the cluster
      settings would be vulnerable to it).
@tbg tbg mentioned this pull request Aug 2, 2017
6 tasks
tbg added a commit that referenced this pull request Aug 10, 2017
Pick up #17155, which introduced a basic mechanism to persist the cluster
version on the nodes' stores. To do this properly, we address the following:

- refactor the node startup sequence so that the (start-up) cluster version is
  known before any moving parts (stores, sql server, ...) are started.
- backfill the cluster version into all stores that don't have it.
- in particular, make sure bootstrapping new stores into an existing node does
  not race with this backfill (otherwise, switching out the stores one by one
  could in the worst case completely lose the persisted version).
- use the determined cluster version to initialize the cluster setting's
  version. Essentially, the engine-backed version becomes the "default version"
  of the statemachine setting backing the version, so that it is used until we
  get the actual cluster version via gossip.
- persist a cluster version during cluster bootstrap. Thus we'll be able to
  fail the boot sequence in v1.2 when we see a bootstrapped store without
  a persisted entry.

TODO:

- [ ] Testing. Unit tests + an acceptance test will likely have to do.
- [ ] clean up the most egrerious hack: the StateMachineSetting initialization.
- [ ] clean up the methods and comments in `(*Stores).*ClusterVersion.*`.
- [ ] Update the RFC.
- [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required.
- [ ] Migration that makes sure there's always a cluster version settings able
entry (@m-schneider).
- [ ] Disallow version bump if settings table has no value entry.
- [ ] Consider removing the reliance on gossip callback ordering. Gossip does
      not *guarantee* that version updates arrive in-order, and we currently
      persist the cluster version blindly. `*ExposedClusterVersion` would
      have to do its own synthesis to refuse illegal changes backwards. This
      may be a non-issue but needs some consideration (all of the cluster
      settings would be vulnerable to it).
- [ ] Allow callback registration to get notified on version changes. Probably
      won't add this until a PR needs it.
tbg added a commit to tbg/cockroach that referenced this pull request Aug 11, 2017
This finishes up the bulk of the remaining work for the [version migation
RFC][rfc]. There are two main changes:

1. Add `*ExposedClusterVersion` to the `cluster.Settings` struct. Internally,
   it wraps the `version` cluster setting, but it exposes it indirectly to
   synchronize version updates with the node's engines. More precisely,
   - `*ExposedClusterVersion` is initialized with the persisted version early
     in the `Node` startup sequence
   - the `Node` registers itself for notifications whenever the `version`
     changes
   - `*ExposedClusterVersion` only exposes a change in the underlying `version`
     variable to callers *after* calling the `Node` callback (which in turn
     persists the update to all engines).
   - The result is that whenever `IsActive()` or `Version()` is called, the
     returned version is safe to use or to migrate to without having to worry
     about backwards compatibility.
1. Add code that synthesizes from and writes the version to the engines.
   This mostly picks up cockroachdb#17155 with some code golf that refactors the node
   startup sequence appropriately, making sure that even when there are stores
   that need (asynchronous) bootstrapping, they'll still have the cluster
   version persisted at the end.

TODO:

- [ ] Testing.
- [ ] Update the RFC.
- [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required.
- [ ] Migration that makes sure there's always a cluster version settings able
entry (@m-schneider).
- [ ] Disallow version bump if settings table has no value entry.
- [ ] Consider removing the reliance on gossip callback ordering. Gossip does
      not *guarantee* that version updates arrive in-order, and we currently
      persist the cluster version blindly. `*ExposedClusterVersion` would
      have to do its own synthesis to refuse illegal changes backwards. This
      may be a non-issue but needs some consideration (all of the cluster
      settings would be vulnerable to it).

[rfc]: cockroachdb#16977
tbg added a commit to tbg/cockroach that referenced this pull request Aug 14, 2017
This finishes up the bulk of the remaining work for the [version migation
RFC][rfc]. There are two main changes:

1. Add `*ExposedClusterVersion` to the `cluster.Settings` struct. Internally,
   it wraps the `version` cluster setting, but it exposes it indirectly to
   synchronize version updates with the node's engines. More precisely,
   - `*ExposedClusterVersion` is initialized with the persisted version early
     in the `Node` startup sequence
   - the `Node` registers itself for notifications whenever the `version`
     changes
   - `*ExposedClusterVersion` only exposes a change in the underlying `version`
     variable to callers *after* calling the `Node` callback (which in turn
     persists the update to all engines).
   - The result is that whenever `IsActive()` or `Version()` is called, the
     returned version is safe to use or to migrate to without having to worry
     about backwards compatibility.
1. Add code that synthesizes from and writes the version to the engines.
   This mostly picks up cockroachdb#17155 with some code golf that refactors the node
   startup sequence appropriately, making sure that even when there are stores
   that need (asynchronous) bootstrapping, they'll still have the cluster
   version persisted at the end.

TODO:

- [ ] Testing.
- [ ] Update the RFC.
- [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required.
- [ ] Migration that makes sure there's always a cluster version settings able
entry (@m-schneider).
- [ ] Disallow version bump if settings table has no value entry.
- [ ] Consider removing the reliance on gossip callback ordering. Gossip does
      not *guarantee* that version updates arrive in-order, and we currently
      persist the cluster version blindly. `*ExposedClusterVersion` would
      have to do its own synthesis to refuse illegal changes backwards. This
      may be a non-issue but needs some consideration (all of the cluster
      settings would be vulnerable to it).

[rfc]: cockroachdb#16977
tbg added a commit to tbg/cockroach that referenced this pull request Aug 15, 2017
This finishes up the bulk of the remaining work for the [version migation
RFC][rfc]. There are two main changes:

1. Add `*ExposedClusterVersion` to the `cluster.Settings` struct. Internally,
   it wraps the `version` cluster setting, but it exposes it indirectly to
   synchronize version updates with the node's engines. More precisely,
   - `*ExposedClusterVersion` is initialized with the persisted version early
     in the `Node` startup sequence
   - the `Node` registers itself for notifications whenever the `version`
     changes
   - `*ExposedClusterVersion` only exposes a change in the underlying `version`
     variable to callers *after* calling the `Node` callback (which in turn
     persists the update to all engines).
   - The result is that whenever `IsActive()` or `Version()` is called, the
     returned version is safe to use or to migrate to without having to worry
     about backwards compatibility.
1. Add code that synthesizes from and writes the version to the engines.
   This mostly picks up cockroachdb#17155 with some code golf that refactors the node
   startup sequence appropriately, making sure that even when there are stores
   that need (asynchronous) bootstrapping, they'll still have the cluster
   version persisted at the end.

TODO:

- [ ] Testing.
- [ ] Update the RFC.
- [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required.
- [ ] Migration that makes sure there's always a cluster version settings able
entry (@m-schneider).
- [ ] Disallow version bump if settings table has no value entry.
- [ ] Consider removing the reliance on gossip callback ordering. Gossip does
      not *guarantee* that version updates arrive in-order, and we currently
      persist the cluster version blindly. `*ExposedClusterVersion` would
      have to do its own synthesis to refuse illegal changes backwards. This
      may be a non-issue but needs some consideration (all of the cluster
      settings would be vulnerable to it).

[rfc]: cockroachdb#16977
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants