server,settings: properly cascade defaults for TenantReadOnly#110758
server,settings: properly cascade defaults for TenantReadOnly#110758craig[bot] merged 6 commits intocockroachdb:masterfrom
Conversation
|
Next steps:
|
ce05159 to
d2221ad
Compare
110789: server: sync on SQL setting overrides in testserver tenant init r=yuzefovich a=knz Prerequisite for tests in #110758. Fixes #110560. Epic: CRDB-6671 Prior to this patch, if a test was running SET CLUSTER SETTING or ALTER VIRTUAL CLUSTER SET CLUSTER SETTING prior to starting the service for a virtual cluster, it wasn't guaranteed that the setting update had propagated (because it propagates via a rangefeed). This commit fixes that. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
d2221ad to
23b89d3
Compare
23b89d3 to
ad757c9
Compare
bc0f64c to
ed2869f
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Nice work figuring all of this out! I skipped the first commit, but 2nd through 4th look good to me, I only have a handful of comments. It seems beneficial for Steven to review too.
Reviewed 72 of 73 files at r9, 3 of 3 files at r10, 2 of 2 files at r11, 80 of 80 files at r12, 3 of 3 files at r13, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
pkg/server/tenantsettingswatcher/overrides_store.go line 309 at r11 (raw file):
alternateDefaults []kvpb.TenantSetting, ) []kvpb.TenantSetting { // log.VEventf(context.TODO(), 1, "splicing alternate defaults:\nexplicitKeys=%# v\noverrides=%# v\nalternateDefaults=%# v",
nit: leftover here and throughout this function.
pkg/server/tenantsettingswatcher/overrides_store.go line 315 at r11 (raw file):
aIter, bIter := 0, 0 for aIter < len(overrides) && bIter < len(alternateDefaults) { log.VEventf(context.Background(), 1, "iter: aIter=%d bIter=%d", aIter, bIter)
nit: we should have ctx available in scope of all callsites, but also I don't think you want to check this in, right?
pkg/server/tenantsettingswatcher/overrides_store.go line 385 at r11 (raw file):
} else { numDst := 0 for ; (aIter+numDst) < len(dst) &&
I'm confused here about the conditions. Why are we checking aIter+numDst < len(dst)? Also, why are we looking at dst[aIter+numDst] instead of dst[aiter+numDst in the last condition?
We're in the case when dst[aIter].InternalKey > src[bIter].InternalKey which means that we have some elements in src that are not present in dst. My expectation is that we would simply copy those elements over by inserting them into dst before aIter. This loop is about counting the number of contiguous elements in src, so I would expect the code to look roughly like a similar pattern in spliceOverrideDefaults. What is different here?
pkg/server/tenantsettingswatcher/overrides_store_test.go line 86 at r11 (raw file):
expect(o3, "x=xx") // Verify hat overrides also work for the special "all tenants" ID.
nit: s/ hat / that /.
pkg/server/tenantsettingswatcher/overrides_store_test.go line 153 at r11 (raw file):
} func TestSpliceOverrideDefaults(t *testing.T) {
I think we also need a unit test for updateDefaults since it too has non-trivial logic.
pkg/settings/integration_tests/read_only_1/propagation_test.go line 23 at r12 (raw file):
var roS = settings.RegisterStringSetting(settings.TenantReadOnly, "tenant.read.only", "desc", "initial") func TestSettingDefaultPropagationReadOnly1(t *testing.T) {
nit: what's the rationale for splitting up each test case into a separate package? I imagine we could have had a single package with multiple separate settings registered. Is it to get more parallelism for speed? Couldn't that be accomplished by having multiple unit tests within a single package?
ed2869f to
6f60eec
Compare
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @stevendanna and @yuzefovich)
pkg/server/tenantsettingswatcher/overrides_store.go line 309 at r11 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: leftover here and throughout this function.
Removed.
pkg/server/tenantsettingswatcher/overrides_store.go line 315 at r11 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: we should have
ctxavailable in scope of all callsites, but also I don't think you want to check this in, right?
Indeed. Removed.
pkg/server/tenantsettingswatcher/overrides_store.go line 385 at r11 (raw file):
Why are we checking
aIter+numDst < len(dst)? Also, why are we looking atdst[aIter+numDst]instead ofdst[aiter+numDstin the last condition?
Good catch. This was also a bug in the splice function, which I needed to fix earlier; then I forgot to also remove it here.
This needed a unit test. 😿
Added the test too.
pkg/server/tenantsettingswatcher/overrides_store_test.go line 86 at r11 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/ hat / that /.
Done
pkg/server/tenantsettingswatcher/overrides_store_test.go line 153 at r11 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think we also need a unit test for
updateDefaultssince it too has non-trivial logic.
Done.
pkg/settings/integration_tests/read_only_1/propagation_test.go line 23 at r12 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: what's the rationale for splitting up each test case into a separate package? I imagine we could have had a single package with multiple separate settings registered. Is it to get more parallelism for speed? Couldn't that be accomplished by having multiple unit tests within a single package?
I spent two days trying to coerce CI to accept the long execution time for the full test (~10mn in CI). I failed; then I gave up. Splitting the test in 4 packages was the only way I found it to pass. (Even just 2 packages was still not enough.)
yuzefovich
left a comment
There was a problem hiding this comment.
I only have some minor nits, (don't forget to remove the last commit) I'll leave it up to you to decide whether to wait for Steven's review or not.
Reviewed 27 of 27 files at r14, 4 of 4 files at r15, 22 of 22 files at r16, 4 of 4 files at r17, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
pkg/server/settingswatcher/settings_watcher.go line 116 at r14 (raw file):
// NewWithNotifier constructs a new SettingsWatcher which notifies // an observer about changes to TenantReadOnly settings. func NewWithNotifier(
nit: perhaps rename it to NewWithRONotifier?
pkg/server/tenantsettingswatcher/overrides_store.go line 73 at r15 (raw file):
// This map is used when an alternate default is changed // asynchronously after the overrides have been loaded from the // rangefeed already, to detect which override need to be updated
nit: s/which override/which overrides/.
pkg/server/tenantsettingswatcher/overrides_store.go line 186 at r15 (raw file):
if tenantID == allTenantOverridesID { // Inherit alternate defaults. overrides = append([]kvpb.TenantSetting(nil), s.mu.alternateDefaults...)
nit: not sure how important performance here is, but I think make + copy is a little faster that append.
pkg/server/tenantsettingswatcher/overrides_store.go line 251 at r15 (raw file):
// override in system.tenant_settings. // // The input slice must be sorted by key already.
nit: do we want to assert this behind buildutil.CrdbTestBuild somewhere?
pkg/settings/integration_tests/propagation.go line 37 at r16 (raw file):
// This test currently takes >1 minute. // // TODO(multi-tenant): make it faster. Currently most of the
We have 32 configs in which we start a server and a tenant - can we not easily reuse a single server but starting a fresh tenant for each test case?
pkg/settings/integration_tests/propagation.go line 54 at r16 (raw file):
// Speed up the tests. sysDB.Exec(t, "SET CLUSTER SETTING kv.closed_timestamp.target_duration = '10ms'") sysDB.Exec(t, "ALTER TENANT ALL SET CLUSTER SETTING kv.closed_timestamp.target_duration = '10ms'")
nit: isn't this line redundant because the setting is tenant RO?
pkg/settings/integration_tests/read_only_1/propagation_test.go line 23 at r12 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I spent two days trying to coerce bors+CI to accept the long execution time for the full test (~10mn in CI). I failed; then I gave up. Splitting the test in 4 packages was the only way I found it to pass. (Even just 2 packages was still not enough.)
Ack, thanks for the context. (Also, have you tried manually increasing shard_count to 4 within a single package? Sorry if this is obvious.)
No i have not. I only learned of |
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
pkg/server/testserver.go line 1545 at r17 (raw file):
for _, setting := range []settings.Setting{ sql.SecondaryTenantScatterEnabled,
nit: it looks like we could unexport these cluster settings (except for MR one) if we replace this loop to use cluster setting names directly since it's the only callsite outside of sql that access the objects directly.
pkg/settings/integration_tests/propagation.go line 37 at r16 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
We have 32 configs in which we start a server and a tenant - can we not easily reuse a single server but starting a fresh tenant for each test case?
Never mind, we already do that.
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @stevendanna and @yuzefovich)
pkg/settings/integration_tests/propagation.go line 54 at r16 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: isn't this line redundant because the setting is tenant RO?
lol, yes. 🤦
I blindly copied this from another test. I'll simplify the other tests too: #111383.
(This function/method will also be deleted in a later commit.) Release note: None
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @stevendanna and @yuzefovich)
pkg/server/testserver.go line 1545 at r17 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: it looks like we could unexport these cluster settings (except for MR one) if we replace this loop to use cluster setting names directly since it's the only callsite outside of
sqlthat access the objects directly.
I'm not keen on this; when we do this we lose the ability to do "find references" from the setting definition in a code editor.
pkg/server/settingswatcher/settings_watcher.go line 116 at r14 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: perhaps rename it to
NewWithRONotifier?
Given it's only called in one place and the comment explains, I'll go for the shorter (and more memorable) name for now.
pkg/server/tenantsettingswatcher/overrides_store.go line 73 at r15 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/which override/which overrides/.
Done.
pkg/server/tenantsettingswatcher/overrides_store.go line 186 at r15 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: not sure how important performance here is, but I think
make + copyis a little faster thatappend.
TIL - Done.
pkg/server/tenantsettingswatcher/overrides_store.go line 251 at r15 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: do we want to assert this behind
buildutil.CrdbTestBuildsomewhere?
Done.
|
i've modified the test to use just 1 package with sharding. |
In fact I tried it both ways. It's not so clear -- we do need to keep track of which keys in the map were observed in the existing overrides, so as to iterate over the "remaining keys" to add them if they were not seen yet. That means 2 maps with a weird iteration at the end. The complexity is still there, just in a different shape. |
|
@knz Awesome, should have known you already considered it. |
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz)
pkg/server/testserver.go line 1545 at r17 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I'm not keen on this; when we do this we lose the ability to do "find references" from the setting definition in a code editor.
Can you expand on "find references" a bit? Like, in GoLand it doesn't matter if a variable is exported or not - I could still search for its uses. (I'm not pushing on this change, trying to understand your rationale.)
…ed cache (For context, on each node there is a local persisted cache of cluster setting customizations. This exists to ensure that configured values can be used even before a node has fully started up and can start reading customizations from `system.settings`.) Prior to this patch, entries were never evicted from the local persisted cache: when a cluster setting was reset, any previously saved entry in the cache would remain there. This is a very old bug, which was long hidden and was recently revealed when commit 2f5d717 was merged. In a nutshell, before this recent commit the code responsible to load the entries from the cache didn't fully work and so the stale entries were never restored from the cache. That commit fixed the loader code, and so the stale entries became active, which made the old bug visible. To fix the old bug, this present commit modifies the settings watcher to preserve KV deletion events, and propagates them to the persisted cache. (There is no release note because there is no user-facing release where the bug was visible.) Release note: None Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Prior to this patch, the rangefeed watcher over `system.settings` was updating the in-RAM value store before it propagated the updates to the persisted local cache. In fact, the update to the persisted local cache was lagging quite a bit behind, because the rangefeed watcher would buffer updates and only flush them after a while. As a result, the following sequence was possible: 1. client updates a cluster setting. 2. server is immediately shut down. The persisted cache has not been updated yet. 3. server is restarted. For a short while (until the settings watcher has caught up), the old version of the setting remains active. This recall of ghost values of a setting was simply a bug. This patch fixes that, by ensuring that the persisted cache is written through before the in-RAM value store. By doing this, we give up on batching updates to the persisted local store. This is deemed acceptable because cluster settings are not updated frequently. Release note: None
In a later commit, we will want to feed changes to TenantReadOnly settings in the system tenant's `system.settings` table into the connector, for use by the virtual cluster servers. This commit sets up the first building block: an extra callback function in the settings watcher that gets called when the watcher detects a change to TenantReadOnly settings. Release note: None
As an end-goal, we want that when there is no override in `system.tenant_settings`, i.e. the rangefeed over this table has nothing to stay, we pick up the current value of `TenantReadOnly` settings from `system.settings` instead to feed to virtual clusters. To achieve this, we need to merge the current `TenantReadOnly` values "underneath" the tenant overrides, i.e. use the current `TenantReadOnly` as default override value when the rangefeed over `tenant_settings` has nothing to say. This commit updates the overrides store accordingly. It does so by maintaining an `alternateDefaults` override slice, updated from changes to `TenantReadOnly` settings from `system.settings`. Then, updates to the overrides map (in `tenantOverrides`) for the special tenant ID `allTenantOverridesID` are "merged" from `alternateDefaults` with the following rule: - when a change to `alternateDefaults` is made, the current `allTenantOverridesID` override slice is updated for each setting *not* currently set from the rangefeed (those missing from the `explicitKeys` map). - when an *addition* or *modification* is observed from the rangefeed (for tenant ID `allTenantOverridesID`), the overrides slice is updated with the rangefeed value, and the rangefeed setting key is added to `explicitKeys`. - when a *deletion* is observed from the rangefeed, the key is removed from `explicitKeys` and any override from `alternateDefaults` is inserted instead. Note that entries are never deleted from `alternateDefaults`, even when a TenantReadOnly customization is removed from `system.settings` (i.e. the callback `setAlternateDefaults` is given a smaller slice than the current `alternateDefaults`). This is on purpose: to properly support mixed-version deployments, we want that SQL servers observe the storage layer's idea of what the default value of a TenantReadOnly setting should be. To achieve this, it's important that the storage server always reports an override for every TenantReadOnly setting. Release note: None
TLDR: this patch ensures that virtual cluster servers observe changes made to TenantReadOnly settings via SET CLUSTER SETTING in the system interface, even when there is no override set via ALTER VIRTUAL CLUSTER SET CLUSTER SETTING. For example, after `SET CLUSTER SETTING kv.closed_timestamp.target_duration = '10s'` in the system interface, this value will show up via `SHOW CLUSTER SETTING` in a virtual cluster SQL session. This changes the way that settings are picked up in virtual cluster, as follows: 1. if there is an override specifically for this tenant's ID (in `tenant_settings`), use that. 2. otherwise, if there is an override for the pseudo-ID 0 (in `tenant_settings` still, set via `ALTER TENANT ALL SET CLUSTER SETTING`), then use that. 3. **NEW** otherwise, if the class is TenantReadOnly and there is a custom value in `system.settings`, set via a regular `SET CLUSTER SETTING` in the system tenant, then use that. 4. otherwise, use the global default set via the setting's `Register()` call. ---- Prior to this patch, TenantReadOnly settings as observed from virtual clusters were defined as the following priority order: 1. if there is an override specifically for this tenant's ID (in `tenant_settings`), use that. 2. otherwise, if there is an override for the pseudo-ID 0 (in `tenant_settings` still, set via `ALTER TENANT ALL SET CLUSTER SETTING`), then use that. 3. otherwise, use the global default set via the setting's `Register()` call. Remarkably, this did not pick up any changes made via a plain `SET CLUSTER SETTING` statement via the system interface, which only modifies this setting's value in `system.settings` (thus not `tenant_settings`). This situation was problematic in two ways. To start, settings like `kv.closed_timestamp.target_duration` cannot be set solely in `system.tenant_settings`; they are also used in the storage layer and so must also be picked up from changes in `system.settings`. For these settings, it is common for operators to just issue the plain `SET CLUSTER SETTING` statement (to update `system.settings`) and simply forget to _also_ run `ALTER TENANT ALL SET CLUSTER SETTING`. This mistake is nearly unavoidable and would result in incoherent behavior, where the storage layer would use the customized value and virtual clusters would use the registered global default. The second problem is in mixed-version configurations, where the storage layer runs version N+1 and the SQL service runs version N of the executable. If the registered global default changes from version N to N+1, the SQL service would not properly pick up the new default defined in version N+1 of the storage layer. This patch fixes both problems as follows: - it integrates changes to TenantReadOnly settings observed in `system.settings`, to the watcher that tracks changes to `system.tenant_settings`. When a TenantReadOnly setting is present in the former but not the latter, a synthetic override is added. - it also initializes synthetic overrides for all the TenantReadOnly settings upon server initialization, from the registered global default, so that virtual cluster servers always pick up the storage layer's default as override. Release note: None
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)
pkg/server/testserver.go line 1545 at r17 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Can you expand on "find references" a bit? Like, in GoLand it doesn't matter if a variable is exported or not - I could still search for its uses. (I'm not pushing on this change, trying to understand your rationale.)
Do you mean that it's always possible to "search for a text string in an entire project" (i.e. grep)
Yes that works but it's not exactly precise. And also the lack of structural reference doesn't work very well when we choose to rename settings.
|
bors r=stevendanna,yuzefovich |
I meant "Find Usages" feature of GoLand. Not renaming sgtm. |
|
Build succeeded: |
Previous in sequence:
.Overrideset the value origin #111150Fixes #108677.
Fixes #85729.
Fixes #91825.
Completes the work described in the settings RFC.
Epic: CRDB-6671
TLDR: this patch ensures that virtual cluster servers observe changes
made to TenantReadOnly settings via SET CLUSTER SETTING in the system
interface, even when there is no override set via ALTER VIRTUAL
CLUSTER SET CLUSTER SETTING.
For example, after
SET CLUSTER SETTING kv.closed_timestamp.target_duration = '10s'in the system interface,this value will show up via
SHOW CLUSTER SETTINGin a virtualcluster SQL session.
This changes the way that settings are picked up in virtual cluster,
as follows:
(in
tenant_settings), use that.(in
tenant_settingsstill, set viaALTER TENANT ALL SET CLUSTER SETTING), then use that.is a custom value in
system.settings, set via a regularSET CLUSTER SETTINGin the system tenant, then use that.Register()call.Prior to this patch, TenantReadOnly settings as observed from virtual
clusters were defined as the following priority order:
(in
tenant_settings), use that.(in
tenant_settingsstill, set viaALTER TENANT ALL SET CLUSTER SETTING), then use that.Register()call.Remarkably, this did not pick up any changes made via a plain
SET CLUSTER SETTINGstatement via the system interface, which onlymodifies this setting's value in
system.settings(thus nottenant_settings).This situation was problematic in two ways.
To start, settings like
kv.closed_timestamp.target_durationcannotbe set solely in
system.tenant_settings; they are also used in thestorage layer and so must also be picked up from changes in
system.settings.For these settings, it is common for operators to just issue the plain
SET CLUSTER SETTINGstatement (to updatesystem.settings) andsimply forget to also run
ALTER TENANT ALL SET CLUSTER SETTING.This mistake is nearly unavoidable and would result in incoherent
behavior, where the storage layer would use the customized value
and virtual clusters would use the registered global default.
The second problem is in mixed-version configurations, where
the storage layer runs version N+1 and the SQL service runs version N
of the executable. If the registered global default changes from
version N to N+1, the SQL service would not properly pick up
the new default defined in version N+1 of the storage layer.
This patch fixes both problems as follows:
it integrates changes to TenantReadOnly settings observed
in
system.settings, to the watcher that tracks changesto
system.tenant_settings. When a TenantReadOnly settingis present in the former but not the latter, a synthetic
override is added.
it also initializes synthetic overrides for all the
TenantReadOnly settings upon server initialization,
from the registered global default, so that virtual
cluster servers always pick up the storage layer's
default as override.