[WIP] cluster: implement version migration RFC by spencerkimball · Pull Request #17155 · cockroachdb/cockroach

spencerkimball · 2017-07-20T21:25:35Z

Introduce Version and ClusterVersion protobuf messages.

Add default ServerVersion and MinimumSupportedVersion to the base
package.

Add read/write methods to the storage.Stores object for persisting
the gossiped ClusterVersion setting to prevent too-old or too-new
binaries from starting on the data in a configured store.

TODO:

Decide whether to build a callback facility for settings package.
Check gossiped ClusterVersion against setting either on new
settings callback or every time that the system config changes.
Provide a hook when setting the cluster version setting to verify
that existing nodes can accommodate the new value.

cockroach-teamcity · 2017-07-20T21:25:40Z

This change is

tbg · 2017-07-21T01:30:56Z

Looks good so far! For the env var, maybe you can poach something from https://github.com/cockroachdb/cockroach/compare/master...tschottdorf:migration?expand=1#diff-4ec93c4ea7116a6504b783da77089a7aR26.

Reviewed 20 of 20 files at r1.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

pkg/storage/stores.go, line 348 at r1 (raw file):

// ReadClusterVersion reads and returns the ClusterVersion protobuf
// written to any of the configured stores which has the highest
// minimum version. The returned value is also replicated to all

Mention what happens if that isn't unique. You could have same min version, but different use versions.

pkg/storage/stores_test.go, line 348 at r1 (raw file):

	ls3.AddStore(stores[0])
	verifyBI.Reset()
	if err := ls3.ReadBootstrapInfo(&verifyBI); err != nil {

Why this change?

pkg/storage/stores_test.go, line 486 at r1 (raw file):

	var err error
	if _, err = ls2.ReadClusterVersion(ctx); !testutils.IsError(err, "which is < minimum supported version") {
		t.Errorf("expected error reading version older than minimum supported")

Print the error, too.

pkg/storage/stores_test.go, line 512 at r1 (raw file):

	var err error
	if _, err = ls2.ReadClusterVersion(ctx); !testutils.IsError(err, "which is > supported version") {
		t.Errorf("expected error reading required version later than supported")

Ditto.

Comments from Reviewable

spencerkimball · 2017-07-21T15:06:15Z

As mentioned on Slack, I was hoping you could take this PR over. Looks like this code is going to be useful for the SET CLUSTER SETTING command. Are you planning to also make this settable via the CLI?

Review status: 19 of 20 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.

pkg/storage/stores.go, line 348 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Mention what happens if that isn't unique. You could have same min version, but different use versions.

Before updating this documentation, we should figure out what the most coherent policy is when the value of UseVesion is different between stores.

pkg/storage/stores_test.go, line 348 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Why this change?

There was a bug in the test; it wasn't testing what it was supposed to be testing...from a fresh Stores object.

pkg/storage/stores_test.go, line 486 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Print the error, too.

Done.

pkg/storage/stores_test.go, line 512 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Ditto.

Done.

Comments from Reviewable

RaduBerinde · 2017-07-21T20:43:53Z

Just a note that cluster settings can already have change callbacks (OnChange).

bdarnell · 2017-07-24T17:23:47Z

Reviewed 19 of 20 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

pkg/storage/stores.go, line 348 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

Before updating this documentation, we should figure out what the most coherent policy is when the value of UseVesion is different between stores.

I think there's going to be some gossiped value stored in the system config span that is the authoritative version of this. Whenever we get a gossip update we should update all our stores to match. If the stores disagree on use_version I don't think it matters much which one we use while waiting for our first gossip version, but I think I'd go with the smallest use_version value.

Comments from Reviewable

Implement (hopefully) more than 50% of the migration RFC[1]. Specifically, - add a StatemachineSetting to the `settings` package - use it for `SET CLUSTER SETTING version = '<major>.minor'` - pass a newly created `*ExposedClusterVersion` around - use `(*ExposedClusterVersion).IsActive(targetVersion)` to toggle existing features that so far used the stub `util.IsMigrated()` - remove `util.IsMigrated()`. The missing part is incorporating the remaining parts of [2], that is, storing the seen version to disk. [1]: cockroachdb#16977 [2]: cockroachdb#17155

Implement (hopefully) more than 50% of the migration RFC[1]. Specifically, - add a StateMachineSetting to the `settings` package - use it for `SET CLUSTER SETTING version = '<major>.<minor>-<unstable>'` - pass a newly created `*ExposedClusterVersion` around - use `(*ExposedClusterVersion).IsActive(targetVersion)` to toggle existing features that so far used the stub `util.IsMigrated()` - remove `util.IsMigrated()`. The missing part is incorporating the remaining parts of [2], that is, storing the seen version to disk. [1]: cockroachdb#16977 [2]: cockroachdb#17155

Add read/write methods to the `storage.Stores` object for persisting the gossiped `ClusterVersion` setting to prevent too-old or too-new binaries from starting on the data in a configured store.

tbg · 2017-08-02T18:42:49Z

Superseded by upcoming PR.

@m-schneider

Out for early review. The code contains numerous hacks that are listed below. Making this work required a bit of refactoring (and it may not work yet; no real testing has been done). In particular, I'm looking to validate that this code will *actually* serve our migration needs for this upcoming release, but also the ones after. Your feedback is especially important since I am obviously in a rush to finish this up, and this invites poor quality. Review with combined commits, there isn't much use to looking at the first one in isolation; there's a lot of churn. ---- Pick up cockroachdb#17155, which introduced a basic mechanism to persist the cluster version on the nodes' stores. To do this properly, we address the following: - refactor the node startup sequence so that the (start-up) cluster version is known before any moving parts (stores, sql server, ...) are started. - backfill the cluster version into all stores that don't have it. - in particular, make sure bootstrapping new stores into an existing node does not race with this backfill (otherwise, switching out the stores one by one could in the worst case completely lose the persisted version). - use the determined cluster version to initialize the `*ExposedClusterVersion`. This marriage is presently not a happy one since the singleton cluster settings want to be registered at init time, but we don't know the value then. It's unfortunate that the singleton will require significant refactoring to run mixed-cluster in-memory tests, but this is a constraint we'll have to live with, at least for the initial PR. - persist a cluster version during cluster bootstrap. Thus we'll be able to fail the boot sequence in v1.2 when we see a bootstrapped store without a persisted entry. - Allow callback registration to get notified on version changes. TODO: - [ ] Testing. Unit tests + an acceptance test will likely have to do. - [ ] clean up the most egrerious hack: the StateMachineSetting initialization. - [ ] clean up the methods and comments in `(*Stores).*ClusterVersion.*`. - [ ] Update the RFC. - [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required. - [ ] Migration that makes sure there's always a cluster version settings able entry (@m-schneider). - [ ] Disallow version bump if settings table has no value entry. - [ ] Consider removing the reliance on gossip callback ordering. Gossip does not *guarantee* that version updates arrive in-order, and we currently persist the cluster version blindly. `*ExposedClusterVersion` would have to do its own synthesis to refuse illegal changes backwards. This may be a non-issue but needs some consideration (all of the cluster settings would be vulnerable to it).

@m-schneider

Pick up #17155, which introduced a basic mechanism to persist the cluster version on the nodes' stores. To do this properly, we address the following: - refactor the node startup sequence so that the (start-up) cluster version is known before any moving parts (stores, sql server, ...) are started. - backfill the cluster version into all stores that don't have it. - in particular, make sure bootstrapping new stores into an existing node does not race with this backfill (otherwise, switching out the stores one by one could in the worst case completely lose the persisted version). - use the determined cluster version to initialize the cluster setting's version. Essentially, the engine-backed version becomes the "default version" of the statemachine setting backing the version, so that it is used until we get the actual cluster version via gossip. - persist a cluster version during cluster bootstrap. Thus we'll be able to fail the boot sequence in v1.2 when we see a bootstrapped store without a persisted entry. TODO: - [ ] Testing. Unit tests + an acceptance test will likely have to do. - [ ] clean up the most egrerious hack: the StateMachineSetting initialization. - [ ] clean up the methods and comments in `(*Stores).*ClusterVersion.*`. - [ ] Update the RFC. - [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required. - [ ] Migration that makes sure there's always a cluster version settings able entry (@m-schneider). - [ ] Disallow version bump if settings table has no value entry. - [ ] Consider removing the reliance on gossip callback ordering. Gossip does not *guarantee* that version updates arrive in-order, and we currently persist the cluster version blindly. `*ExposedClusterVersion` would have to do its own synthesis to refuse illegal changes backwards. This may be a non-issue but needs some consideration (all of the cluster settings would be vulnerable to it). - [ ] Allow callback registration to get notified on version changes. Probably won't add this until a PR needs it.

@m-schneider

This finishes up the bulk of the remaining work for the [version migation RFC][rfc]. There are two main changes: 1. Add `*ExposedClusterVersion` to the `cluster.Settings` struct. Internally, it wraps the `version` cluster setting, but it exposes it indirectly to synchronize version updates with the node's engines. More precisely, - `*ExposedClusterVersion` is initialized with the persisted version early in the `Node` startup sequence - the `Node` registers itself for notifications whenever the `version` changes - `*ExposedClusterVersion` only exposes a change in the underlying `version` variable to callers *after* calling the `Node` callback (which in turn persists the update to all engines). - The result is that whenever `IsActive()` or `Version()` is called, the returned version is safe to use or to migrate to without having to worry about backwards compatibility. 1. Add code that synthesizes from and writes the version to the engines. This mostly picks up cockroachdb#17155 with some code golf that refactors the node startup sequence appropriately, making sure that even when there are stores that need (asynchronous) bootstrapping, they'll still have the cluster version persisted at the end. TODO: - [ ] Testing. - [ ] Update the RFC. - [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required. - [ ] Migration that makes sure there's always a cluster version settings able entry (@m-schneider). - [ ] Disallow version bump if settings table has no value entry. - [ ] Consider removing the reliance on gossip callback ordering. Gossip does not *guarantee* that version updates arrive in-order, and we currently persist the cluster version blindly. `*ExposedClusterVersion` would have to do its own synthesis to refuse illegal changes backwards. This may be a non-issue but needs some consideration (all of the cluster settings would be vulnerable to it). [rfc]: cockroachdb#16977

@m-schneider

This finishes up the bulk of the remaining work for the [version migation RFC][rfc]. There are two main changes: 1. Add `*ExposedClusterVersion` to the `cluster.Settings` struct. Internally, it wraps the `version` cluster setting, but it exposes it indirectly to synchronize version updates with the node's engines. More precisely, - `*ExposedClusterVersion` is initialized with the persisted version early in the `Node` startup sequence - the `Node` registers itself for notifications whenever the `version` changes - `*ExposedClusterVersion` only exposes a change in the underlying `version` variable to callers *after* calling the `Node` callback (which in turn persists the update to all engines). - The result is that whenever `IsActive()` or `Version()` is called, the returned version is safe to use or to migrate to without having to worry about backwards compatibility. 1. Add code that synthesizes from and writes the version to the engines. This mostly picks up cockroachdb#17155 with some code golf that refactors the node startup sequence appropriately, making sure that even when there are stores that need (asynchronous) bootstrapping, they'll still have the cluster version persisted at the end. TODO: - [ ] Testing. - [ ] Update the RFC. - [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required. - [ ] Migration that makes sure there's always a cluster version settings able entry (@m-schneider). - [ ] Disallow version bump if settings table has no value entry. - [ ] Consider removing the reliance on gossip callback ordering. Gossip does not *guarantee* that version updates arrive in-order, and we currently persist the cluster version blindly. `*ExposedClusterVersion` would have to do its own synthesis to refuse illegal changes backwards. This may be a non-issue but needs some consideration (all of the cluster settings would be vulnerable to it). [rfc]: cockroachdb#16977

@m-schneider

This finishes up the bulk of the remaining work for the [version migation RFC][rfc]. There are two main changes: 1. Add `*ExposedClusterVersion` to the `cluster.Settings` struct. Internally, it wraps the `version` cluster setting, but it exposes it indirectly to synchronize version updates with the node's engines. More precisely, - `*ExposedClusterVersion` is initialized with the persisted version early in the `Node` startup sequence - the `Node` registers itself for notifications whenever the `version` changes - `*ExposedClusterVersion` only exposes a change in the underlying `version` variable to callers *after* calling the `Node` callback (which in turn persists the update to all engines). - The result is that whenever `IsActive()` or `Version()` is called, the returned version is safe to use or to migrate to without having to worry about backwards compatibility. 1. Add code that synthesizes from and writes the version to the engines. This mostly picks up cockroachdb#17155 with some code golf that refactors the node startup sequence appropriately, making sure that even when there are stores that need (asynchronous) bootstrapping, they'll still have the cluster version persisted at the end. TODO: - [ ] Testing. - [ ] Update the RFC. - [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required. - [ ] Migration that makes sure there's always a cluster version settings able entry (@m-schneider). - [ ] Disallow version bump if settings table has no value entry. - [ ] Consider removing the reliance on gossip callback ordering. Gossip does not *guarantee* that version updates arrive in-order, and we currently persist the cluster version blindly. `*ExposedClusterVersion` would have to do its own synthesis to refuse illegal changes backwards. This may be a non-issue but needs some consideration (all of the cluster settings would be vulnerable to it). [rfc]: cockroachdb#16977

spencerkimball requested a review from tbg July 20, 2017 21:25

spencerkimball force-pushed the version-migrations branch 2 times, most recently from 84eb1b4 to 7f701a0 Compare July 20, 2017 22:23

spencerkimball force-pushed the version-migrations branch from 7f701a0 to 002ccc8 Compare July 21, 2017 15:05

tbg mentioned this pull request Jul 25, 2017

migrations part 1/2: add version cluster setting #17216

Merged

spencerkimball and others added 6 commits August 1, 2017 20:07

storage: persist ClusterVersion

95b0f92

Add read/write methods to the `storage.Stores` object for persisting the gossiped `ClusterVersion` setting to prevent too-old or too-new binaries from starting on the data in a configured store.

storage: use highest UseVersion

a1a20cb

wip

c1933b4

always persist a cluster version during bootstrap

ee7717d

synthesize from empty stores

4628a7e

checkpoint: compiles

f770498

tbg force-pushed the version-migrations branch from 002ccc8 to f770498 Compare August 2, 2017 18:42

spencerkimball requested a review from a team August 2, 2017 18:42

spencerkimball requested a review from a team as a code owner August 2, 2017 18:42

tbg closed this Aug 2, 2017

tbg mentioned this pull request Aug 2, 2017

migrations part 2/2: persistence #17411

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] cluster: implement version migration RFC#17155

[WIP] cluster: implement version migration RFC#17155
spencerkimball wants to merge 6 commits intomasterfrom
version-migrations

spencerkimball commented Jul 20, 2017

Uh oh!

cockroach-teamcity commented Jul 20, 2017

Uh oh!

tbg commented Jul 21, 2017

Uh oh!

spencerkimball commented Jul 21, 2017

Uh oh!

RaduBerinde commented Jul 21, 2017

Uh oh!

bdarnell commented Jul 24, 2017

Uh oh!

tbg commented Aug 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

spencerkimball commented Jul 20, 2017

Uh oh!

cockroach-teamcity commented Jul 20, 2017

Uh oh!

tbg commented Jul 21, 2017

Uh oh!

spencerkimball commented Jul 21, 2017

Uh oh!

RaduBerinde commented Jul 21, 2017

Uh oh!

bdarnell commented Jul 24, 2017

Uh oh!

tbg commented Aug 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants