base,server,testutils: clarify the multi-tenant test situations#106768
base,server,testutils: clarify the multi-tenant test situations#106768craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
08f1f31 to
15e44bc
Compare
15e44bc to
cbf54da
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r1, 14 of 115 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @jayshrivastava, @knz, @Santamaura, @srosenberg, and @stevendanna)
pkg/sql/logictest/logic.go line 1392 at r2 (raw file):
// the implicit creation of the default test tenant. // // TODO(#76378): This conditional is naive - what does it expect to do
DefaultTestTenantOptions here comes from the logic test config, and we currently only have a binary choice "use tenant or not", so we don't ever enable the probabilistic option. Do you foresee that changing? I can see us randomly deciding to use shared-process multi-tenant instead of single tenant, but enabling separate-process multi-tenant and using it by default for non-tenant logic test configs will probably require some elbow grease to make the existing tests non-flaky (one concrete example is that in the tenant world DistSQL physical planner is not deterministic, so EXPLAIN tests might become non-deterministic in multi-node setup).
pkg/sql/logictest/logic.go line 1501 at r2 (raw file):
connsForClusterSettingChanges := []*gosql.DB{t.cluster.ServerConn(0)} if cfg.DefaultTestTenant.TestTenantAlwaysEnabled() { // TODO(#76378): What's going here? It looks like this code is
Indeed, in the logic test framework we currently don't use serverutils.StartServer and control ourselves whether we start the tenant or not.
pkg/sql/logictest/logic.go line 2993 at r2 (raw file):
// // TODO(#76378): It seems the conditional should include `|| // t.cluster.StartedDefaultTestTenant()` here.
Currently, that will always return false.
herkolategan
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava, @knz, @Santamaura, @srosenberg, @stevendanna, and @yuzefovich)
pkg/sql/sqlinstance/instancestorage/instancecache_test.go line 73 at r3 (raw file):
ctx := context.Background() host, _, _ := serverutils.StartServer(t, base.TestServerArgs{DefaultTestTenant: base.TODOTestTenantDisabled})
This test like a few others here are outside of ccl and does not enable a license for testing purposes. I tested locally and the flag has no influence here due to a license check opting out of tenants altogether: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/testserver.go#L548. I suspect we'll address this in another PR, or will it also be part of this one?
yuzefovich
left a comment
There was a problem hiding this comment.
Overall, I like this, so We can discuss each of the TODOs on a case by case basis (like what to do in the logic test framework, etc).
nit: update the PR description with the latest version of the enum.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @jayshrivastava, @knz, @Santamaura, @srosenberg, and @stevendanna)
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @jayshrivastava, @Santamaura, @srosenberg, @stevendanna, and @yuzefovich)
pkg/sql/logictest/logic.go line 1392 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
DefaultTestTenantOptionshere comes from the logic test config, and we currently only have a binary choice "use tenant or not", so we don't ever enable the probabilistic option. Do you foresee that changing? I can see us randomly deciding to use shared-process multi-tenant instead of single tenant, but enabling separate-process multi-tenant and using it by default for non-tenant logic test configs will probably require some elbow grease to make the existing tests non-flaky (one concrete example is that in the tenant world DistSQL physical planner is not deterministic, so EXPLAIN tests might become non-deterministic in multi-node setup).
I am starting to understand. At a very high level, the logic test framework neither cares nor has to care about the randomization logic we have in testutils, because we have decided that we already want to run logic tests in all configs, all the time (i.e. cross product).
So at the core, the problem to solve here is how to:
- opt out of the randomization in testutils.
- at the same time, ensure the logic test profiles that do want secondary tenants, to get the same "server config" as a regular unit test.
I think one way i'd like to move forward on this (not this PR) is to change logictestbase.TestClusterConfig to use a different go type than base.DefaultTestTenantOptions to control the use of secondary tenants. Maybe just UseSecondaryTenant bool. So logic test profiles have just a boolean. And then we can propagate it to the TestServer config with a conditional that chooses between TestTenantAlwaysEnabled and TestIsSpecificToStorageLayerAndNeedsASystemTenant.
pkg/sql/logictest/logic.go line 1501 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Indeed, in the logic test framework we currently don't use
serverutils.StartServerand control ourselves whether we start the tenant or not.
See my comment above: we should not use DefaultTestTenantOptions in the TestClusterConfig.
pkg/sql/logictest/logic.go line 2993 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Currently, that will always return
false.
Indeed, I now understand why. We can remove my comment when we do the simplification.
pkg/sql/sqlinstance/instancestorage/instancecache_test.go line 73 at r3 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
This test like a few others here are outside of
ccland does not enable a license for testing purposes. I tested locally and the flag has no influence here due to a license check opting out of tenants altogether: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/testserver.go#L548. I suspect we'll address this in another PR, or will it also be part of this one?
For some tests like these we could try to make them import the CCL packages. We've discussed that before: it's ok for test code in the non-CCL packages to import CCL code.
When it's not possible, we need to make all these tests at least use the non-networked tenant connector that doesn't need CCL.
See my comment on #106772:
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @jayshrivastava, @knz, @Santamaura, @srosenberg, and @stevendanna)
pkg/sql/logictest/logic.go line 1392 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I am starting to understand. At a very high level, the logic test framework neither cares nor has to care about the randomization logic we have in testutils, because we have decided that we already want to run logic tests in all configs, all the time (i.e. cross product).
So at the core, the problem to solve here is how to:
- opt out of the randomization in testutils.
- at the same time, ensure the logic test profiles that do want secondary tenants, to get the same "server config" as a regular unit test.
I think one way i'd like to move forward on this (not this PR) is to change
logictestbase.TestClusterConfigto use a different go type thanbase.DefaultTestTenantOptionsto control the use of secondary tenants. Maybe justUseSecondaryTenant bool. So logic test profiles have just a boolean. And then we can propagate it to the TestServer config with a conditional that chooses betweenTestTenantAlwaysEnabledandTestIsSpecificToStorageLayerAndNeedsASystemTenant.
Makes sense and SGTM. FWIW we used to have a boolean in there initially too.
Prior to this patch, a test could choose only between the following options: - `TestTenantProbabilisticOnly` and `TestTenantProbabilistic`: whether or not the load goes to a secondary tenant is randomly selected for each test. - `TestTenantEnabled`: always use a secondary tenant. - `TestTenantDisabled`: always disable the secondary tenant. In particular the last option was problematic, as we ended up using it both in the case "the test doesn't work with secondary tenants and *we don't know why*" and the case "it doesn't work, we know why *and it's OK to keep it that way*". We actually want to distinguish these cases, because the former case requires further investigation, and we want the test code to refer to this follow-up work more clearly. To address this, this commit defines the following options: ```go // (UNCHANGED) TestTenantProbabilisticOnly // (UNCHANGED) TestTenantProbabilistic // RENAMED: was previously "TestTenantEnabled" // // TestTenantAlwaysEnabled will always start the default test tenant. This is useful // for quickly verifying that a test works with tenants enabled. // // Note: this value should not be used for checked in test code // unless there is a good reason to do so. We want the common case // to use TestTenantProbabilistic or TestTenantProbabilisticOnly. TestTenantAlwaysEnabled // RENAMED: was previously "TestTenantDisabled" // // TODOTestTenantDisabled should not be used anymore. Use the // other values instead. // TODO(cockroachdb#76378): Review existing tests and use the proper value instead. TODOTestTenantDisabled = DefaultTestTenantOptions{testBehavior: ttDisabled} // (NEW) TestIsSpecificToStorageLayerAndNeedsASystemTenant is used when // the test needs to be given access to a SQL conn to a tenant with // sufficient capabilities to access all the storage layer. // (Initially that'd be "the" system tenant.) TestIsSpecificToStorageLayerAndNeedsASystemTenant // (NEW) TestControlsTenantsExplicitly is used when the test wants to // manage its own secondary tenants and tenant servers. TestControlsTenantsExplicitly // (NEW) TestNeedsTightIntegrationBetweenAPIsAndTestingKnobs is used when // a test wants to use a single set of testing knobs for both the // storage layer and the SQL layer and for simplicity of the test // code we want to give that test a simplified environment. // // Note: it is debatable whether the gain in test code simplicity is // worth the cost of never running that test with the virtualization // layer active. TestNeedsTightIntegrationBetweenAPIsAndTestingKnobs // (NEW) TestDoesNotWorkWithSecondaryTenantsButWeDontKnowWhyYet can be used // to disable virtualization because the test doesn't appear to be compatible // with it, and we don't understand it yet. // It should link to a github issue with label C-test-failure. TestDoesNotWorkWithSecondaryTenantsButWeDontKnowWhyYet(issueNumber int) // (NEW) TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet can be // used to disable virtualization because the test exercises a feature // known not to work with virtualization enabled yet but we wish it // to work eventually. // // It should link to a github issue with label C-bug // and the issue should be linked to an epic under INI-213 or INI-214. TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet(issueNumber int) ``` Release note: None Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
This renames it to `TODOTestTenantDisabled` and refers to follow up work under cockroachdb#76378 to use one of the other values instead. Release note: None
cbf54da to
d469b72
Compare
Release note: None
d469b72 to
5b1a5af
Compare
knz
left a comment
There was a problem hiding this comment.
Yahor, PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan, @jayshrivastava, @Santamaura, @srosenberg, @stevendanna, and @yuzefovich)
pkg/sql/logictest/logic.go line 1392 at r2 (raw file):
DefaultTestTenantOptionshere comes from the logic test config, and we currently only have a binary choice "use tenant or not", so we don't ever enable the probabilistic option
I have re-read the original code and I believe this is incorrect in general.
What you say is true in the specific case where a config would specify either TestTenantEnable or TestTenantDisabled explicitly.
But when it does not, the zero value of the field is TestTenantProbabilisticOnly.
Then, in the conditional here defaultTestTenant remains what is originally set (because the value is != TestTenantEnabled).
So eventually a tenant server is created randomly by testutils.
This is incidentally why below in this file we have a condition like if cfg.DefaultTestTenant == TestTenantEnabled || t.cluster.StartedDefaultTestTenant(). The second part of the condition is for the case the config didn't specify anything and a secondary tenant was started randomly.
I have added an extra commit in the PR. Do you like it?
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 15 of 116 files at r4, 14 of 115 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @jayshrivastava, @knz, @Santamaura, @srosenberg, and @stevendanna)
pkg/sql/logictest/logic.go line 1392 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
DefaultTestTenantOptionshere comes from the logic test config, and we currently only have a binary choice "use tenant or not", so we don't ever enable the probabilistic optionI have re-read the original code and I believe this is incorrect in general.
What you say is true in the specific case where a config would specify eitherTestTenantEnableorTestTenantDisabledexplicitly.
But when it does not, the zero value of the field isTestTenantProbabilisticOnly.
Then, in the conditional heredefaultTestTenantremains what is originally set (because the value is!= TestTenantEnabled).So eventually a tenant server is created randomly by testutils.
This is incidentally why below in this file we have a condition like
if cfg.DefaultTestTenant == TestTenantEnabled || t.cluster.StartedDefaultTestTenant(). The second part of the condition is for the case the config didn't specify anything and a secondary tenant was started randomly.I have added an extra commit in the PR. Do you like it?
Yeah, you're right, I missed that. I like that change, thanks.
|
TFYR! bors r=yuzefovich |
|
Timed out. |
|
bors r=yuzefovich |
|
Build succeeded: |

Informs #76378 .
Informs #106772.
Epic: CRDB-18499
Prior to this patch, a test could choose only between the following
options:
TestTenantProbabilisticOnlyandTestTenantProbabilistic:whether or not the load goes to a secondary tenant is randomly
selected for each test.
TestTenantEnabled: always use a secondary tenant.TestTenantDisabled: always disable the secondary tenant.In particular the last option was problematic, as we ended up using it
both in the case "the test doesn't work with secondary tenants and
we don't know why" and the case "it doesn't work, we know why
and it's OK to keep it that way".
We actually want to distinguish these cases, because the former
case requires further investigation, and we want the test code
to refer to this follow-up work more clearly.
To address this, this commit defines the following options:
Release note: None