Skip to content

base,server,testutils: clarify the multi-tenant test situations#106768

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
knz:20230713-test-tenant-framework
Jul 14, 2023
Merged

base,server,testutils: clarify the multi-tenant test situations#106768
craig[bot] merged 3 commits intocockroachdb:masterfrom
knz:20230713-test-tenant-framework

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Jul 13, 2023

Informs #76378 .
Informs #106772.
Epic: CRDB-18499

Prior to this patch, a test could choose only between the following
options:

  • TestTenantProbabilisticOnly and TestTenantProbabilistic:
    whether or not the load goes to a secondary tenant is randomly
    selected for each test.

  • TestTenantEnabled: always use a secondary tenant.

  • TestTenantDisabled: always disable the secondary tenant.

In particular the last option was problematic, as we ended up using it
both in the case "the test doesn't work with secondary tenants and
we don't know why" and the case "it doesn't work, we know why
and it's OK to keep it that way".

We actually want to distinguish these cases, because the former
case requires further investigation, and we want the test code
to refer to this follow-up work more clearly.

To address this, this commit defines the following options:

// (UNCHANGED)
TestTenantProbabilisticOnly
// (UNCHANGED)
TestTenantProbabilistic

// RENAMED: was previously "TestTenantEnabled"
//
// TestTenantAlwaysEnabled will always start the default test tenant. This is useful
// for quickly verifying that a test works with tenants enabled.
//
// Note: this value should not be used for checked in test code
// unless there is a good reason to do so. We want the common case
// to use TestTenantProbabilistic or TestTenantProbabilisticOnly.
TestTenantAlwaysEnabled

// RENAMED: was previously "TestTenantDisabled"
//
// TODOTestTenantDisabled should not be used anymore. Use the
// other values instead.
// TODO(https://github.com/cockroachdb/cockroach/issues/76378): Review existing tests and use the proper value instead.
TODOTestTenantDisabled = DefaultTestTenantOptions{testBehavior: ttDisabled}

// (NEW) TestIsSpecificToStorageLayerAndNeedsASystemTenant is used when
// the test needs to be given access to a SQL conn to a tenant with
// sufficient capabilities to access all the storage layer.
// (Initially that'd be "the" system tenant.)
TestIsSpecificToStorageLayerAndNeedsASystemTenant

// (NEW) TestControlsTenantsExplicitly is used when the test wants to
// manage its own secondary tenants and tenant servers.
TestControlsTenantsExplicitly

// (NEW) TestNeedsTightIntegrationBetweenAPIsAndTestingKnobs is used when
// a test wants to use a single set of testing knobs for both the
// storage layer and the SQL layer and for simplicity of the test
// code we want to give that test a simplified environment.
//
// Note: it is debatable whether the gain in test code simplicity is
// worth the cost of never running that test with the virtualization
// layer active.
TestNeedsTightIntegrationBetweenAPIsAndTestingKnobs

// (NEW) TestDoesNotWorkWithSecondaryTenantsButWeDontKnowWhyYet can be used
// to disable virtualization because the test doesn't appear to be compatible
// with it, and we don't understand it yet.
// It should link to a github issue with label C-test-failure.
TestDoesNotWorkWithSecondaryTenantsButWeDontKnowWhyYet(issueNumber int)

// (NEW) TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet can be
// used to disable virtualization because the test exercises a feature
// known not to work with virtualization enabled yet but we wish it
// to work eventually.
//
// It should link to a github issue with label C-bug
// and the issue should be linked to an epic under INI-213 or INI-214.
TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet(issueNumber int)

Release note: None

@knz knz requested review from a team, herkolategan, stevendanna and yuzefovich July 13, 2023 17:54
@knz knz requested review from a team as code owners July 13, 2023 17:54
@knz knz requested a review from a team July 13, 2023 17:54
@knz knz requested review from a team as code owners July 13, 2023 17:54
@knz knz requested a review from a team July 13, 2023 17:54
@knz knz requested review from a team as code owners July 13, 2023 17:54
@knz knz requested review from Santamaura, jayshrivastava and srosenberg and removed request for a team July 13, 2023 17:54
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, 14 of 115 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @jayshrivastava, @knz, @Santamaura, @srosenberg, and @stevendanna)


pkg/sql/logictest/logic.go line 1392 at r2 (raw file):

		// the implicit creation of the default test tenant.
		//
		// TODO(#76378): This conditional is naive - what does it expect to do

DefaultTestTenantOptions here comes from the logic test config, and we currently only have a binary choice "use tenant or not", so we don't ever enable the probabilistic option. Do you foresee that changing? I can see us randomly deciding to use shared-process multi-tenant instead of single tenant, but enabling separate-process multi-tenant and using it by default for non-tenant logic test configs will probably require some elbow grease to make the existing tests non-flaky (one concrete example is that in the tenant world DistSQL physical planner is not deterministic, so EXPLAIN tests might become non-deterministic in multi-node setup).


pkg/sql/logictest/logic.go line 1501 at r2 (raw file):

	connsForClusterSettingChanges := []*gosql.DB{t.cluster.ServerConn(0)}
	if cfg.DefaultTestTenant.TestTenantAlwaysEnabled() {
		// TODO(#76378): What's going here? It looks like this code is

Indeed, in the logic test framework we currently don't use serverutils.StartServer and control ourselves whether we start the tenant or not.


pkg/sql/logictest/logic.go line 2993 at r2 (raw file):

			//
			// TODO(#76378): It seems the conditional should include `||
			// t.cluster.StartedDefaultTestTenant()` here.

Currently, that will always return false.

Copy link
Copy Markdown
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava, @knz, @Santamaura, @srosenberg, @stevendanna, and @yuzefovich)


pkg/sql/sqlinstance/instancestorage/instancecache_test.go line 73 at r3 (raw file):

	ctx := context.Background()
	host, _, _ := serverutils.StartServer(t, base.TestServerArgs{DefaultTestTenant: base.TODOTestTenantDisabled})

This test like a few others here are outside of ccl and does not enable a license for testing purposes. I tested locally and the flag has no influence here due to a license check opting out of tenants altogether: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/testserver.go#L548. I suspect we'll address this in another PR, or will it also be part of this one?

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like this, so :lgtm: We can discuss each of the TODOs on a case by case basis (like what to do in the logic test framework, etc).

nit: update the PR description with the latest version of the enum.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jayshrivastava, @knz, @Santamaura, @srosenberg, and @stevendanna)

Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @jayshrivastava, @Santamaura, @srosenberg, @stevendanna, and @yuzefovich)


pkg/sql/logictest/logic.go line 1392 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

DefaultTestTenantOptions here comes from the logic test config, and we currently only have a binary choice "use tenant or not", so we don't ever enable the probabilistic option. Do you foresee that changing? I can see us randomly deciding to use shared-process multi-tenant instead of single tenant, but enabling separate-process multi-tenant and using it by default for non-tenant logic test configs will probably require some elbow grease to make the existing tests non-flaky (one concrete example is that in the tenant world DistSQL physical planner is not deterministic, so EXPLAIN tests might become non-deterministic in multi-node setup).

I am starting to understand. At a very high level, the logic test framework neither cares nor has to care about the randomization logic we have in testutils, because we have decided that we already want to run logic tests in all configs, all the time (i.e. cross product).

So at the core, the problem to solve here is how to:

  1. opt out of the randomization in testutils.
  2. at the same time, ensure the logic test profiles that do want secondary tenants, to get the same "server config" as a regular unit test.

I think one way i'd like to move forward on this (not this PR) is to change logictestbase.TestClusterConfig to use a different go type than base.DefaultTestTenantOptions to control the use of secondary tenants. Maybe just UseSecondaryTenant bool. So logic test profiles have just a boolean. And then we can propagate it to the TestServer config with a conditional that chooses between TestTenantAlwaysEnabled and TestIsSpecificToStorageLayerAndNeedsASystemTenant.


pkg/sql/logictest/logic.go line 1501 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Indeed, in the logic test framework we currently don't use serverutils.StartServer and control ourselves whether we start the tenant or not.

See my comment above: we should not use DefaultTestTenantOptions in the TestClusterConfig.


pkg/sql/logictest/logic.go line 2993 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Currently, that will always return false.

Indeed, I now understand why. We can remove my comment when we do the simplification.


pkg/sql/sqlinstance/instancestorage/instancecache_test.go line 73 at r3 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

This test like a few others here are outside of ccl and does not enable a license for testing purposes. I tested locally and the flag has no influence here due to a license check opting out of tenants altogether: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/testserver.go#L548. I suspect we'll address this in another PR, or will it also be part of this one?

For some tests like these we could try to make them import the CCL packages. We've discussed that before: it's ok for test code in the non-CCL packages to import CCL code.

When it's not possible, we need to make all these tests at least use the non-networked tenant connector that doesn't need CCL.
See my comment on #106772:

image.png

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @jayshrivastava, @knz, @Santamaura, @srosenberg, and @stevendanna)


pkg/sql/logictest/logic.go line 1392 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I am starting to understand. At a very high level, the logic test framework neither cares nor has to care about the randomization logic we have in testutils, because we have decided that we already want to run logic tests in all configs, all the time (i.e. cross product).

So at the core, the problem to solve here is how to:

  1. opt out of the randomization in testutils.
  2. at the same time, ensure the logic test profiles that do want secondary tenants, to get the same "server config" as a regular unit test.

I think one way i'd like to move forward on this (not this PR) is to change logictestbase.TestClusterConfig to use a different go type than base.DefaultTestTenantOptions to control the use of secondary tenants. Maybe just UseSecondaryTenant bool. So logic test profiles have just a boolean. And then we can propagate it to the TestServer config with a conditional that chooses between TestTenantAlwaysEnabled and TestIsSpecificToStorageLayerAndNeedsASystemTenant.

Makes sense and SGTM. FWIW we used to have a boolean in there initially too.

knz and others added 2 commits July 14, 2023 00:17
Prior to this patch, a test could choose only between the following
options:

- `TestTenantProbabilisticOnly` and `TestTenantProbabilistic`:
  whether or not the load goes to a secondary tenant is randomly
  selected for each test.

- `TestTenantEnabled`: always use a secondary tenant.

- `TestTenantDisabled`: always disable the secondary tenant.

In particular the last option was problematic, as we ended up using it
both in the case "the test doesn't work with secondary tenants and
*we don't know why*" and the case "it doesn't work, we know why
*and it's OK to keep it that way*".

We actually want to distinguish these cases, because the former
case requires further investigation, and we want the test code
to refer to this follow-up work more clearly.

To address this, this commit defines the following options:

```go
// (UNCHANGED)
TestTenantProbabilisticOnly
// (UNCHANGED)
TestTenantProbabilistic

// RENAMED: was previously "TestTenantEnabled"
//
// TestTenantAlwaysEnabled will always start the default test tenant. This is useful
// for quickly verifying that a test works with tenants enabled.
//
// Note: this value should not be used for checked in test code
// unless there is a good reason to do so. We want the common case
// to use TestTenantProbabilistic or TestTenantProbabilisticOnly.
TestTenantAlwaysEnabled

// RENAMED: was previously "TestTenantDisabled"
//
// TODOTestTenantDisabled should not be used anymore. Use the
// other values instead.
// TODO(cockroachdb#76378): Review existing tests and use the proper value instead.
TODOTestTenantDisabled = DefaultTestTenantOptions{testBehavior: ttDisabled}

// (NEW) TestIsSpecificToStorageLayerAndNeedsASystemTenant is used when
// the test needs to be given access to a SQL conn to a tenant with
// sufficient capabilities to access all the storage layer.
// (Initially that'd be "the" system tenant.)
TestIsSpecificToStorageLayerAndNeedsASystemTenant

// (NEW) TestControlsTenantsExplicitly is used when the test wants to
// manage its own secondary tenants and tenant servers.
TestControlsTenantsExplicitly

// (NEW) TestNeedsTightIntegrationBetweenAPIsAndTestingKnobs is used when
// a test wants to use a single set of testing knobs for both the
// storage layer and the SQL layer and for simplicity of the test
// code we want to give that test a simplified environment.
//
// Note: it is debatable whether the gain in test code simplicity is
// worth the cost of never running that test with the virtualization
// layer active.
TestNeedsTightIntegrationBetweenAPIsAndTestingKnobs

// (NEW) TestDoesNotWorkWithSecondaryTenantsButWeDontKnowWhyYet can be used
// to disable virtualization because the test doesn't appear to be compatible
// with it, and we don't understand it yet.
// It should link to a github issue with label C-test-failure.
TestDoesNotWorkWithSecondaryTenantsButWeDontKnowWhyYet(issueNumber int)

// (NEW) TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet can be
// used to disable virtualization because the test exercises a feature
// known not to work with virtualization enabled yet but we wish it
// to work eventually.
//
// It should link to a github issue with label C-bug
// and the issue should be linked to an epic under INI-213 or INI-214.
TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet(issueNumber int)
```

Release note: None

Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
This renames it to `TODOTestTenantDisabled` and refers to follow up
work under cockroachdb#76378 to use one of the other values instead.

Release note: None
@knz knz force-pushed the 20230713-test-tenant-framework branch from cbf54da to d469b72 Compare July 13, 2023 22:20
@knz knz force-pushed the 20230713-test-tenant-framework branch from d469b72 to 5b1a5af Compare July 13, 2023 22:21
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yahor, PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan, @jayshrivastava, @Santamaura, @srosenberg, @stevendanna, and @yuzefovich)


pkg/sql/logictest/logic.go line 1392 at r2 (raw file):

DefaultTestTenantOptions here comes from the logic test config, and we currently only have a binary choice "use tenant or not", so we don't ever enable the probabilistic option

I have re-read the original code and I believe this is incorrect in general.
What you say is true in the specific case where a config would specify either TestTenantEnable or TestTenantDisabled explicitly.
But when it does not, the zero value of the field is TestTenantProbabilisticOnly.
Then, in the conditional here defaultTestTenant remains what is originally set (because the value is != TestTenantEnabled).

So eventually a tenant server is created randomly by testutils.

This is incidentally why below in this file we have a condition like if cfg.DefaultTestTenant == TestTenantEnabled || t.cluster.StartedDefaultTestTenant(). The second part of the condition is for the case the config didn't specify anything and a secondary tenant was started randomly.

I have added an extra commit in the PR. Do you like it?

@knz knz requested a review from yuzefovich July 13, 2023 22:22
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 15 of 116 files at r4, 14 of 115 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @jayshrivastava, @knz, @Santamaura, @srosenberg, and @stevendanna)


pkg/sql/logictest/logic.go line 1392 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

DefaultTestTenantOptions here comes from the logic test config, and we currently only have a binary choice "use tenant or not", so we don't ever enable the probabilistic option

I have re-read the original code and I believe this is incorrect in general.
What you say is true in the specific case where a config would specify either TestTenantEnable or TestTenantDisabled explicitly.
But when it does not, the zero value of the field is TestTenantProbabilisticOnly.
Then, in the conditional here defaultTestTenant remains what is originally set (because the value is != TestTenantEnabled).

So eventually a tenant server is created randomly by testutils.

This is incidentally why below in this file we have a condition like if cfg.DefaultTestTenant == TestTenantEnabled || t.cluster.StartedDefaultTestTenant(). The second part of the condition is for the case the config didn't specify anything and a secondary tenant was started randomly.

I have added an extra commit in the PR. Do you like it?

Yeah, you're right, I missed that. I like that change, thanks.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Jul 14, 2023

TFYR!

bors r=yuzefovich

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 14, 2023

Timed out.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Jul 14, 2023

bors r=yuzefovich

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 14, 2023

Build succeeded:

@craig craig bot merged commit ebea3d4 into cockroachdb:master Jul 14, 2023
@knz knz deleted the 20230713-test-tenant-framework branch July 14, 2023 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants