Skip to content

clusterversion: significantly rework cluster version handling#45455

Merged
craig[bot] merged 4 commits intocockroachdb:masterfrom
irfansharif:200213.migration-refactor
Mar 2, 2020
Merged

clusterversion: significantly rework cluster version handling#45455
craig[bot] merged 4 commits intocockroachdb:masterfrom
irfansharif:200213.migration-refactor

Conversation

@irfansharif
Copy link
Copy Markdown
Contributor

@irfansharif irfansharif commented Feb 26, 2020

We split off ClusterVersion out of pkg/settings/cluster into a dedicated
pkg/clusterversion. This is to pave the way for #39182 where we
introduce long running version migration infrastructure.

We then introduce clusterversion.Handle as a read only view to the
active cluster version and this binary's version details. We similarly
fold in the actual global cluster version setting into
pkg/clusterversion, and enforce all external usage to go through
clusterversion.Handle. We can also hide away external usage of the baked
in cluster.Binary{,MinimumSupported}Version constants. Instead we have
the implementation of clusterversion.Handle allow for tests to override
binary and minimum supported versions as needed.

Along the way we clean up Settings.Initialized, as it is not really
used. We document all the various "versions" in play ("binary version",
"minimum supported version", "active version") and change usage of what
was previously "serverVersion" to simply be "binaryVersion", because
that's what it is. We also clean up the Settings constructors into
Test/Non-Test types that differ in cluster version setting
initialization behaviour.


For reviewers: It's probably easiest to start with what
pkg/settings/cluster/cluster_settings.go looks like, then following into
pkg/clusterversion/cluster_version.go and then pkg/clusterversion/setting.go.


I still don't like the following detail about our pkg structure:

  • pkg/settings/cluster depends on it's "parent" pkg, pkg/settings
  • pkg/settings/cluster depends pkg/clusterversion, which in turn depends
    on pkg/settings

Naively, pkg/settings/cluster should be a top level package, but I'm not
making that change in this PR. For now I've renamed the settings.go file
to cluster_settings.go.

Release note: None

@irfansharif irfansharif requested review from danhhz, otan and tbg February 26, 2020 17:41
@irfansharif irfansharif requested a review from a team as a code owner February 26, 2020 17:41
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@irfansharif irfansharif removed the request for review from a team February 26, 2020 17:42
Copy link
Copy Markdown
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks great. The way you have the constructors here is a decided improvement over my WIP. I left a couple initial nits, but I did have a hard time reviewing the file moves because it was unclear what changed. Could you please introduce a prefix commit that does only the git mvs so it's clear in (what will be) the second commit what changed about them?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz, @irfansharif, @otan, and @tbg)


pkg/clusterversion/cluster_version.go, line 1 at r2 (raw file):

// Copyright 2017 The Cockroach Authors.

nit: I'm not sure anyone knows how these dates are supposed to work, but this file is new enough content that I'd have put this as 2020.


pkg/clusterversion/cluster_version.go, line 93 at r2 (raw file):

	// of the nodes in the cluster have running binaries that are at least as
	// new as that version, and that you're guaranteed that those nodes will
	// never be downgraded to an older version.

nit: never be downgraded to a binary with a version older than this one.


pkg/clusterversion/cluster_version.go, line 184 at r2 (raw file):

func (v *handleImpl) BinaryVersion() roachpb.Version {
	if v.binaryVersionOverride != (roachpb.Version{}) {

The MakeVersionHandle constructor actually sets this override to binaryVersion, so we'll always take this branch. That's a little odd


pkg/sql/logictest/logic.go, line 1107 at r2 (raw file):

			nodeParams := params.ServerArgs
			nodeParams.Settings = cluster.MakeTestingClusterSettingsWithVersions(
				cfg.binaryVersion, binaryMinSupportedVersion /* initialize */, false)

a bunch of these calls have the /* initialize */ in the wrong place

Copy link
Copy Markdown
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/settings/cluster depends pkg/clusterversion, which in turn depends
on pkg/settings/cluster

Also, I think one of these package names should be something else, no? Go won't allow this dep graph

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz, @irfansharif, @otan, and @tbg)

Copy link
Copy Markdown
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've also noticed clusterversion.ClusterVersion{} is a bit tautological.
should we have any tests for clusterversion.go? We added some new stuff, may be worth unit testing handleImpl.
maybe worth refactoring later (but can't think of a good name)

Reviewed 3 of 67 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @otan, and @tbg)


pkg/clusterversion/cluster_version.go, line 1 at r2 (raw file):

// Copyright 2017 The Cockroach Authors.

should this also be called clusterversion.go instead?


pkg/clusterversion/cluster_version.go, line 44 at r2 (raw file):

import (
	"context"
	"github.com/cockroachdb/cockroach/pkg/util/syncutil"

nit: it smells like this import should be in the next paragraph?


pkg/settings/cluster/cluster_settings.go, line 128 at r2 (raw file):

func MakeTestingClusterSettingsWithVersions(
	binaryVersion, binaryMinSupportedVersion roachpb.Version,
	initialize bool,

nit: can we make this an enum or something to be more descriptive?

@irfansharif irfansharif force-pushed the 200213.migration-refactor branch 2 times, most recently from 62fecf5 to ea5c834 Compare February 26, 2020 20:31
Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please introduce a prefix commit that does only the git mvs so it's clear in (what will be) the second commit what changed about them?

Done.

Also, I think one of these package names should be something else, no? Go won't allow this dep graph

Woops, fixed.

i've also noticed clusterversion.ClusterVersion{} is a bit tautological.

Agreed, it should probably be called pkg/version instead. But I'll leave the restructuring for when we have a clearer idea of what it all should look like.

We added some new stuff, may be worth unit testing handleImpl.

I think I'll hold off on this for the same reason + the fact that it's a very thin read-only shim around the global cluster version setting singleton. As I understand what pkg/clusterversion (or whatever it ends up being called) should look like, we can test the parts that make sense.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan and @tbg)


pkg/clusterversion/cluster_version.go, line 1 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

nit: I'm not sure anyone knows how these dates are supposed to work, but this file is new enough content that I'd have put this as 2020.

Done.


pkg/clusterversion/cluster_version.go, line 1 at r2 (raw file):

Previously, otan (Oliver Tan) wrote…

should this also be called clusterversion.go instead?

Done (we don't seem to be too principled about this generally).


pkg/clusterversion/cluster_version.go, line 44 at r2 (raw file):

Previously, otan (Oliver Tan) wrote…

nit: it smells like this import should be in the next paragraph?

Woops, my linter should've caught that. Done.


pkg/clusterversion/cluster_version.go, line 93 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

nit: never be downgraded to a binary with a version older than this one.

Done.


pkg/clusterversion/cluster_version.go, line 184 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

The MakeVersionHandle constructor actually sets this override to binaryVersion, so we'll always take this branch. That's a little odd

Done.


pkg/settings/cluster/cluster_settings.go, line 128 at r2 (raw file):

Previously, otan (Oliver Tan) wrote…

nit: can we make this an enum or something to be more descriptive?

Done (renamed to initializeVersion).


pkg/sql/logictest/logic.go, line 1107 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

a bunch of these calls have the /* initialize */ in the wrong place

Done. TIL that gofmt will simplify param a, param b, /* comment */ param c to param a, param b /* comment */, param c.

@irfansharif irfansharif force-pushed the 200213.migration-refactor branch from ea5c834 to 31af9fc Compare February 26, 2020 21:27
Copy link
Copy Markdown
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @otan and @tbg)

@irfansharif
Copy link
Copy Markdown
Contributor Author

irfansharif commented Feb 26, 2020 via email

@irfansharif irfansharif force-pushed the 200213.migration-refactor branch 2 times, most recently from 122274a to b5a40c7 Compare February 26, 2020 23:26
@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Feb 27, 2020

I vote yes, but curious what @tbg thinks.

There may even be an argument for sneaking in the RPC based version update system to the same in-memory state as what gossip updates, but not having anyone send those RPC. We should think through if that helps us in the 20.1->20.2 transition.

@irfansharif irfansharif force-pushed the 200213.migration-refactor branch 2 times, most recently from 678ba04 to c160937 Compare February 27, 2020 22:43
@irfansharif
Copy link
Copy Markdown
Contributor Author

@rohany: Mind rubberstamping create_table.go, last commit? Following up on #42922 (review). Ignoring the discussions about whether or not st can be nil, I'll leave it someone else (you) to figure that out.

return desc, invalidClusterForShardedIndexError
}
if version := cluster.Version.ActiveVersionOrEmpty(ctx, st); version == (clusterversion.ClusterVersion{}) ||
if version := st.Version.ActiveVersionOrEmpty(ctx); version == (clusterversion.ClusterVersion{}) ||
Copy link
Copy Markdown
Contributor

@rohany rohany Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be !=, && like the other cases

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to throw a ! in there, but that aside, I do think we want to error out if the version is not initialized, no?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to take a look at this soon (#45519), so i wouldn't sweat about this. When i can move it to ActiveVersion it will error our if the version is not initialized.

@irfansharif irfansharif force-pushed the 200213.migration-refactor branch 2 times, most recently from 801e037 to 5df42ab Compare February 27, 2020 23:34
@irfansharif irfansharif requested a review from a team as a code owner February 27, 2020 23:34
@irfansharif irfansharif removed the request for review from a team February 27, 2020 23:51
cockroachdb#17650 renamed these fields
for the protobuf-generated C++ code, leaving the Go code unchanged.
Isolate the associated comment so it doesn't confusingly appear in the
Go code.

Release note: None
@irfansharif irfansharif force-pushed the 200213.migration-refactor branch from 5df42ab to e95bba6 Compare February 27, 2020 23:52
@irfansharif
Copy link
Copy Markdown
Contributor Author

@tbg, gentle ping here. I've had to rebase it multiple times to stay afloat of master surprisingly. It's getting a bit slow to try and make sure each commit is functional on its own, so I'll just give up entirely until it's all reviewed.

@irfansharif irfansharif force-pushed the 200213.migration-refactor branch from e95bba6 to 3f8c57d Compare February 28, 2020 00:22
@irfansharif
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 29, 2020

Build failed

@irfansharif
Copy link
Copy Markdown
Contributor Author

irfansharif commented Mar 1, 2020

Caught #44188. bors retry

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 1, 2020

Timed out

@irfansharif
Copy link
Copy Markdown
Contributor Author

irfansharif commented Mar 1, 2020 via email

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 1, 2020

Timed out

@rohany
Copy link
Copy Markdown
Contributor

rohany commented Mar 2, 2020

what is going on with bors rn

@tbg
Copy link
Copy Markdown
Member

tbg commented Mar 2, 2020

I thought it might be this PR, but I might be wrong

@irfansharif
Copy link
Copy Markdown
Contributor Author

This and this are the two most recent builds that were successful, but that bors timed out on.

@irfansharif
Copy link
Copy Markdown
Contributor Author

image

They are taking quite a bit longer than other builds however, specifically for testrace. I don't quite understand why.

image

There's no one lagging test there either.

image

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Mar 2, 2020

testrace taking longer is probably the "maybe stress new tests" logic

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Mar 2, 2020

Nope. "Maybe stressrace pull request (17s)"

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Mar 2, 2020

Are they taking longer for testrace, though? It looking to be taking a long time on all the recent master builds https://teamcity.cockroachdb.com/viewType.html?buildTypeId=Cockroach_UnitTests_Testrace&branch_Cockroach_UnitTests=%3Cdefault%3E&tab=buildTypeStatusDiv

@irfansharif
Copy link
Copy Markdown
Contributor Author

irfansharif commented Mar 2, 2020

bors timeout bumped in #45590.

@irfansharif
Copy link
Copy Markdown
Contributor Author

irfansharif commented Mar 2, 2020

image

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 2, 2020

Build succeeded

@craig craig bot merged commit e314075 into cockroachdb:master Mar 2, 2020
@irfansharif irfansharif deleted the 200213.migration-refactor branch March 5, 2020 22:50
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Dec 1, 2020
This package was extracted out of pkg/settings/cluster in cockroachdb#45455. Now
that we're coupling it tightly with our migrations infrastructure
(pkg/migration), let's remove some unnecessary stuttering for ease of
use.

```diff
- if <settings handle>.Version.IsActive(ctx, clusterversion.VersionSomething) {
+ if <settings handle>.Version.IsActive(ctx, clusterversion.Something) {
```

All the diffs here are just mechanical goland-renames.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Dec 2, 2020
This package was extracted out of pkg/settings/cluster in cockroachdb#45455. Now
that we're coupling it tightly with our migrations infrastructure
(pkg/migration), let's remove some unnecessary stuttering for ease of
use.

```diff
- if <settings handle>.Version.IsActive(ctx, clusterversion.VersionSomething) {
+ if <settings handle>.Version.IsActive(ctx, clusterversion.Something) {
```

All the diffs here are just mechanical goland-renames.

Release note: None
craig bot pushed a commit that referenced this pull request Dec 2, 2020
57306: sqlproxyccl: close proxy connections properly r=jbowens a=spaskob

Issue:
The sqlproxy opens a connection to the client and backend and proxies
data between them. The backend connection is never closed. Moreover the
client connection in certain situations may not be closed properly as
well because it may be embedded in a tls connection whose `Close`
method will never be called.

Details:
Proxy connection to the client is closed in `func (s *Server) Serve`
https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/sqlproxyccl/server.go#L149
However the client connection, if the proxy is using TLS, is further
embedded in `tls.Conn` in func (s *Server) Proxy here
https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/sqlproxyccl/proxy.go#L117.
`tls.Conn` has a more involved `Close` implementation which will not be
called; see https://golang.org/src/crypto/tls/conn.go. In particular,
readers/writers on this connection may not be notified of the connection
closure potentially leading to never exiting the go routines at
https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/sqlproxyccl/proxy.go#L204.

Note that this fix does not prevent another party from calling `Close`
on `proxyConn` which may lead again to `tls.Conn Close` not being
called when then proxy is using TLS.

Release note: none.

57318: colexecagg: fix memory accounting for decimals with avg and sum r=yuzefovich a=yuzefovich

Previously, we were not calling `PerformOperation` to account for the
memory used by the decimals (which are of variable size), and this is
now fixed. Additionally, this commit refactors several other templates to
be more similar to each other.

Note that we decided to call `PerformOperation` on all of the aggregate
functions even if we know that all types that the functions operate on
are of fixed size and that call is not strictly necessary. However, this
has negligible impact on performance in most cases and makes the layout
of `Compute` methods of all functions the same ensuring that we don't
forget these calls in the future.

Additionally, this commit adds capturing of the column to force bounds
check to work for min/max and sum aggregates.

Release note: None

57328: ccl, sql: add telemetry, logs for feature denials by feature flags r=otan a=angelapwen

This change adds a telemetry counter, sql.feature_flag.denied, in
the case where a feature is denied by its corresponding feature flag
being set to disabled in cluster settings. It also adds a log
warning for the database administrator to view the denied command
as well as its optional category.

Release note: none

57338: clusterversion: reduce stuttering in package API r=irfansharif a=irfansharif

This package was extracted out of pkg/settings/cluster in #45455. Now
that we're coupling it tightly with our migrations infrastructure
(pkg/migration), let's remove some unnecessary stuttering for ease of
use.

```diff
- if <settings handle>.Version.IsActive(ctx, clusterversion.VersionSomething) {
+ if <settings handle>.Version.IsActive(ctx, clusterversion.Something) {
```

All the diffs here are just mechanical goland-renames.

Release note: None

57349: colexec: fix jsonb - string projection r=yuzefovich a=yuzefovich

Bytes canonical type family represents several types, and whenever we're
performing a binary operation with one datum and one non-datum arguments
we need to convert the latter to the correct datum type. Previously, we
had an incorrect conversion for Minus operation.

Fixes: #57165.

Release note (bug fix): Previously, CockroachDB would encounter an
internal error when performing `JSONB - String` operation via the
vectorized execution engine, and this has been fixed. The bug was
introduced in 20.2.0 version.

Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: angelapwen <angelaw@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants