Skip to content

multitenant: shared-process tenant auto upgrades even if preserve_downgrade_option was set prior to restart #126754

@renatolabs

Description

@renatolabs

Starting from 24.1+, shared-process tenants will auto-upgrade even if the operator set the cluster.preserve_downgrade_option prior to restarting the nodes.

Reproduction

$ roachprod create local -n1
$ roachprod stage local release v23.2.6 # bootstrap on 23.2
$ roachprod start local
$ roachprod start-sql main --storage-cluster local # create shared-process tenant
$ roachprod sql local:1 --cluster main -- -e "SET CLUSTER SETTING cluster.preserve_downgrade_option = '23.2';"
$ roachprod sql local:1 --cluster main -- -e "SHOW CLUSTER SETTING cluster.preserve_downgrade_option;"
  cluster.preserve_downgrade_option
-------------------------------------
  23.2
(1 row)

$ roachprod deploy local release v24.1.1 # rolling restart to 24.1
$ sleep 30 # wait for a while
$ roachprod sql local:1 --cluster main -- -e "SHOW CLUSTER SETTING version;"
  version
-----------
  24.1
(1 row)

$ roachprod sql local:1 --cluster main -- -e "SHOW CLUSTER SETTING cluster.preserve_downgrade_option;"
  cluster.preserve_downgrade_option
-------------------------------------

(1 row)

As we can see, the tenant correctly knew that the cluster.preserve_downgrade_option setting was set to 23.2 before the restart. Shortly after the restart, however, we see that the tenant's cluster version is already 24.1 and the cluster.preserve_downgrade_option is gone.

This issue only happens for shared-process tenants; separate-process deployments work as expected.

Investigation

As for the cause of this issue, the problem is that the tenant learns about the cluster.preserve_downgrade_option before it learns about the cluster version. The tenant's cluster version is initialized at 23.1 (according to 24.1's MinSupported constant), and therefore, the preserve_downgrade_option setting is refused:

W240704 18:03:31.242481 1501 server/settingswatcher/settings_watcher.go:485 ⋮ [T2,Vmain,n1,rangefeed=‹settings-watcher›] 173 failed to set setting cluster.preserve_downgrade_option to ‹23.2›: cannot set cluster.preserve_downgrade_option to ‹23.2› (cluster version is 23.1)

Shortly after, the tenant gets an accurate view of the cluster version:

I240704 18:03:31.242505 1501 server/settingswatcher/settings_watcher.go:473 ⋮ [T2,Vmain,n1,rangefeed=‹settings-watcher›] 175 set cluster version from 23.1 to: 23.2

Aside

This issue does not happen for separate process deployments because, in #121952, we introduced a makeClusterSettings function that sets MinSupported to the immediate predecessor (23.2 in this case) unless an internal environment variable is set. This function is used by both the system tenant and separate process tenants, but shared-process tenants use a different code path to create initial cluster settings.

Resolution

We can (and should) update shared-process initialization to reuse (or otherwise replicate) the logic in the makeClusterSettings function referenced above and that would fix this issue for now.

However, we will face this problem again when MinSupported != PreviousRelease (i.e., when we officially support version-skip upgrades). The version setting will be initialized to MinSupported and could cause the preserve_downgrade_option to be refused again. This would impact both shared and external process tenants.

Jira issue: CRDB-40107

Metadata

Metadata

Assignees

Labels

A-testingTesting tools and infrastructureC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-testengTestEng Teambranch-masterFailures and bugs on the master branch.branch-release-24.1Used to mark GA and release blockers, technical advisories, and bugs for 24.1branch-release-24.2Used to mark GA and release blockers, technical advisories, and bugs for 24.2

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions