-
Notifications
You must be signed in to change notification settings - Fork 4.1k
multitenant: cannot query after stopping a server in a unit test #107499
Copy link
Copy link
Closed
Labels
A-multitenancyRelated to multi-tenancyRelated to multi-tenancyC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-multitenantIssues owned by the multi-tenant virtual teamIssues owned by the multi-tenant virtual team
Description
In a unit test, we try to stop a server and the shared tenant processes, then we restart the shared tenant, and then fail querying that tenant.
The original test that tried to do this is this c2c test: https://github.com/cockroachdb/cockroach/blob/bcf1f9c17b38ea1bc3c2996bffdbc715980e48c9/pkg/ccl/streamingccl/streamingest/replication_stream_e2e_test.go#L463
Slack thread https://cockroachlabs.slack.com/archives/C02HWA24541/p1690223205316979
Example:
func TestMultiTenantStopServer(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)
ctx := context.Background()
// 1. Start a test cluster:
serverArgs := base.TestServerArgs{
DefaultTestTenant: base.TODOTestTenantDisabled,
}
c := testcluster.StartTestCluster(t, 4, base.TestClusterArgs{ServerArgs: serverArgs})
defer c.Stopper().Stop(ctx)
// 2. Start a tenant:
tenantArgs := base.TestSharedProcessTenantArgs{
TenantName: "mytenant",
TenantID: roachpb.MustMakeTenantID(2),
}
tenantServer, tenantConn := serverutils.StartSharedProcessTenant(t, c.Server(0), tenantArgs)
testutils.SucceedsSoon(t, func() error {
return tenantConn.Ping()
})
sysSQL := sqlutils.MakeSQLRunner(c.ServerConn(0))
tenantSQL := sqlutils.MakeSQLRunner(tenantConn)
// 3. Write stuff:
numRanges := 50
rowsPerRange := 20
sysSQL.Exec(t, `ALTER TENANT mytenant SET CLUSTER SETTING sql.split_at.allow_for_secondary_tenant.enabled=true`)
sysSQL.Exec(t, `ALTER TENANT mytenant SET CLUSTER SETTING sql.scatter.allow_for_secondary_tenant.enabled=true`)
tenantSQL.Exec(t, "CREATE DATABASE d")
tenantSQL.Exec(t, "CREATE TABLE d.scattered (key INT PRIMARY KEY)")
tenantSQL.Exec(t, "INSERT INTO d.scattered (key) SELECT * FROM generate_series(1, $1)",
numRanges*rowsPerRange)
tenantSQL.Exec(t, "ALTER TABLE d.scattered SPLIT AT (SELECT * FROM generate_series($1::INT, $2::INT, $3::INT))",
rowsPerRange, (numRanges-1)*rowsPerRange, rowsPerRange)
tenantSQL.Exec(t, "ALTER TABLE d.scattered SCATTER")
// 4. Verify we can read it:
_ = tenantSQL.QueryStr(t, "SELECT * FROM d.scattered")
// 5. Stop the tenant and stop node 1.
sysSQL.Exec(t, `ALTER TENANT mytenant STOP SERVICE`)
tenantServer.Stopper().Stop(ctx)
c.StopServer(0)
// 6. Start the shared process tenant again:
_, alternateTenantConn := serverutils.StartSharedProcessTenant(t, c.Server(1),
base.TestSharedProcessTenantArgs{
TenantName: "mytenant",
TenantID: roachpb.MustMakeTenantID(2),
})
defer alternateTenantConn.Close()
alternateTenantSQL := sqlutils.MakeSQLRunner(alternateTenantConn)
// 7. Try to read again, which sometimes fails with:
// replication_stream_e2e_test.go:518: error executing 'SELECT * FROM d.scattered': pq: failed to connect to n1 at
// 127.0.0.1:52291: grpc: connection error: desc = "transport: error while dialing: connection interrupted (did the
// remote node shut down or are there networking issues?)" [code 14/Unavailable]
_ = alternateTenantSQL.QueryStr(t, "SELECT * FROM d.scattered")
}
Jira issue: CRDB-30082
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-multitenancyRelated to multi-tenancyRelated to multi-tenancyC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-multitenantIssues owned by the multi-tenant virtual teamIssues owned by the multi-tenant virtual team