Skip to content

ccl/server/sql: Fix failing reset index usage unit test#72358

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
lindseyjin:fix-reset-index-stats
Nov 8, 2021
Merged

ccl/server/sql: Fix failing reset index usage unit test#72358
craig[bot] merged 1 commit intocockroachdb:masterfrom
lindseyjin:fix-reset-index-stats

Conversation

@lindseyjin
Copy link
Copy Markdown
Contributor

@lindseyjin lindseyjin commented Nov 2, 2021

Resolves #72254

Previously, the unit test for resetting index usage statistics was
failing. This was happening because of two reasons:

  1. The IndexUsageStatsController was not being set properly for DistSQL
    server configs.
  2. Noise from other tests was sometimes populating the index usage
    statistics table right after it was reset, causing the check for a newly
    reset table to sometimes randomly fail.

To fix these issues, we have implemented the following:

  1. Created a test case to reproduce the first error, and correctly
    populated IndexUsageStatsController for DistSQL server configs.
  2. Updated the unit test to query for the specific test table id, to
    avoid other tables polluting the test space. The test has also been
    updated to check for a correct LastReset time.

Release note: none

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@lindseyjin lindseyjin requested a review from a team November 2, 2021 22:45
@lindseyjin lindseyjin force-pushed the fix-reset-index-stats branch from fea3193 to c1518f1 Compare November 2, 2021 22:48
Copy link
Copy Markdown
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @lindseyjin)


pkg/ccl/serverccl/tenant_status_test.go, line 380 at r1 (raw file):

  crdb_internal.index_usage_statistics
WHERE
  table_id = ` + testTableIDStr

note: this is fine-isssh in test, and CheckQuerResults doesn't really allow you to do this. But in general we really try to avoid trivially concatenating SQL query string, this is a recipe for SQL injections which is really really bad. We usually use placeholder here for them.


pkg/ccl/serverccl/tenant_status_test.go, line 422 at r1 (raw file):

	require.NotNil(t, rows)
	for _, row := range rows {
		require.Equal(t, row[1], "0", "expected total reads for table %s to be reset, but got %s", row[0], row[1])

nit: wrap long lines at 80 characters


pkg/sql/sem/builtins/builtins_test.go, line 472 at r1 (raw file):

}

func TestResetIndexUsageStats(t *testing.T) {

nit: let's be a bit more specific on the test name, this tests is really testing execution of the builtin when it's being set up on a remote node via DistSQL. So i'm thinking something along the line of TestResetIndexUsageStatsOnRemoteSQLNode


pkg/sql/sem/builtins/builtins_test.go, line 476 at r1 (raw file):

	ctx := context.Background()

	testCluster := serverutils.StartNewTestCluster(t, 9 /* numNodes */, base.TestClusterArgs{

Let's scale this down to 3 nodes? 9 nodes feels quite wasteful for a test like this


pkg/sql/sem/builtins/builtins_test.go, line 485 at r1 (raw file):

	query := `
	CREATE TABLE rides (
    id INT PRIMARY KEY,

nit: let's unify the indentation here for better readability


pkg/sql/sem/builtins/builtins_test.go, line 493 at r1 (raw file):

		name VARCHAR NOT NULL
  );
  INSERT INTO rides (id, city, vehicle_city) VALUES (1, 'Toronto', 'Calgary');

also this tests case is slightly overkill 😛 , all you need is a query that causes a filter to be pushed down into remote node, so let's see if we can simplify this query a bit ?

@lindseyjin lindseyjin force-pushed the fix-reset-index-stats branch 3 times, most recently from d1cb5b2 to 4a6e5fd Compare November 3, 2021 16:19
Copy link
Copy Markdown
Contributor Author

@lindseyjin lindseyjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)


pkg/ccl/serverccl/tenant_status_test.go, line 380 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

note: this is fine-isssh in test, and CheckQuerResults doesn't really allow you to do this. But in general we really try to avoid trivially concatenating SQL query string, this is a recipe for SQL injections which is really really bad. We usually use placeholder here for them.

Noted, thanks! 👍


pkg/sql/sem/builtins/builtins_test.go, line 472 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

nit: let's be a bit more specific on the test name, this tests is really testing execution of the builtin when it's being set up on a remote node via DistSQL. So i'm thinking something along the line of TestResetIndexUsageStatsOnRemoteSQLNode

Done!


pkg/sql/sem/builtins/builtins_test.go, line 476 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Let's scale this down to 3 nodes? 9 nodes feels quite wasteful for a test like this

Done.


pkg/sql/sem/builtins/builtins_test.go, line 493 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

also this tests case is slightly overkill 😛 , all you need is a query that causes a filter to be pushed down into remote node, so let's see if we can simplify this query a bit ?

Hmm I've removed some columns, although this test case won't fail if I don't have data inserted into the tables. Or are there any existing test or internal tables I can use that are already populated?

@lindseyjin lindseyjin force-pushed the fix-reset-index-stats branch 2 times, most recently from 00ef106 to 22248b5 Compare November 3, 2021 21:19
Copy link
Copy Markdown
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @lindseyjin)


pkg/sql/sem/builtins/builtins_test.go, line 493 at r1 (raw file):

Previously, lindseyjin (Lindsey Jin) wrote…

Hmm I've removed some columns, although this test case won't fail if I don't have data inserted into the tables. Or are there any existing test or internal tables I can use that are already populated?

Interesting, turned out it was a bit hard to do this if there's very little data 😅

Here's a reliable reprod

CREATE TABLE t (k INT);
INSERT INTO t SELECT generate_series(1, 30);

--- The following ALTER TABLE forces the range to locate on different nodes, which forces the optmizer to plan
---  the DistSQL operation that executes builtins on the remote node
ALTER TABLE t SPLIT AT VALUES (10), (20);
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 1, 1;
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 2, 15;
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 3, 25;

--- the query that will crash because of the bug
SELECT count(*) FROM t WHERE crdb_internal.reset_index_usage_stats();

This results in a query plan looking like this

and reliably crashes:

root@127.0.0.1:26257/movr> SELECT count(*) FROM t WHERE crdb_internal.reset_index_usage_stats();
ERROR: internal error: crdb_internal.reset_index_usage_stats(): index usage stats controller not set
SQLSTATE: XX000
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/sql/sem/builtins/builtins.go:6070: func306()
github.com/cockroachdb/cockroach/pkg/sql/sem/tree/eval.go:4221: Eval()
github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/expr.go:202: RunFilter()
github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/expr.go:191: EvalFilter()
github.com/cockroachdb/cockroach/pkg/sql/rowexec/filterer.go:95: Next()
github.com/cockroachdb/cockroach/pkg/sql/colexec/columnarizer.go:222: Next()
github.com/cockroachdb/cockroach/pkg/sql/colexec/count.go:51: Next()
github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecutils/deselector.go:63: Next()
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:285: func1()
github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:91: CatchVectorizedRuntimeError()
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:276: sendBatches()
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:410: runWithStream()
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:216: Run()
github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:737: func1()
github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:1331: 1()
runtime/asm_amd64.s:1371: goexit()

Copy link
Copy Markdown
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @lindseyjin)


pkg/sql/sem/builtins/builtins_test.go, line 493 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Interesting, turned out it was a bit hard to do this if there's very little data 😅

Here's a reliable reprod

CREATE TABLE t (k INT);
INSERT INTO t SELECT generate_series(1, 30);

--- The following ALTER TABLE forces the range to locate on different nodes, which forces the optmizer to plan
---  the DistSQL operation that executes builtins on the remote node
ALTER TABLE t SPLIT AT VALUES (10), (20);
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 1, 1;
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 2, 15;
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 3, 25;

--- the query that will crash because of the bug
SELECT count(*) FROM t WHERE crdb_internal.reset_index_usage_stats();

This results in a query plan looking like this

and reliably crashes:

root@127.0.0.1:26257/movr> SELECT count(*) FROM t WHERE crdb_internal.reset_index_usage_stats();
ERROR: internal error: crdb_internal.reset_index_usage_stats(): index usage stats controller not set
SQLSTATE: XX000
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/sql/sem/builtins/builtins.go:6070: func306()
github.com/cockroachdb/cockroach/pkg/sql/sem/tree/eval.go:4221: Eval()
github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/expr.go:202: RunFilter()
github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/expr.go:191: EvalFilter()
github.com/cockroachdb/cockroach/pkg/sql/rowexec/filterer.go:95: Next()
github.com/cockroachdb/cockroach/pkg/sql/colexec/columnarizer.go:222: Next()
github.com/cockroachdb/cockroach/pkg/sql/colexec/count.go:51: Next()
github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecutils/deselector.go:63: Next()
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:285: func1()
github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:91: CatchVectorizedRuntimeError()
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:276: sendBatches()
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:410: runWithStream()
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:216: Run()
github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:737: func1()
github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:1331: 1()
runtime/asm_amd64.s:1371: goexit()

This assumes we have a 3-node cluster

Copy link
Copy Markdown
Contributor Author

@lindseyjin lindseyjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)


pkg/sql/sem/builtins/builtins_test.go, line 493 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

This assumes we have a 3-node cluster

Ah I see, thanks!

Dd you mean that I replace my query with this one, or was this just an answer to my question? I'm not sure if it's actually any simpler 🤔

Copy link
Copy Markdown
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @lindseyjin)


pkg/sql/sem/builtins/builtins_test.go, line 493 at r1 (raw file):

Previously, lindseyjin (Lindsey Jin) wrote…

Ah I see, thanks!

Dd you mean that I replace my query with this one, or was this just an answer to my question? I'm not sure if it's actually any simpler 🤔

Let's replace it with the new query, because I realized the one we have right now is not a reliable reprod, it depends on the amount of data in the table and table statistics used by the optimizer. (i.e. sometimes when I run this query it won't fail)

Copy link
Copy Markdown
Contributor Author

@lindseyjin lindseyjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)


pkg/sql/sem/builtins/builtins_test.go, line 493 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Let's replace it with the new query, because I realized the one we have right now is not a reliable reprod, it depends on the amount of data in the table and table statistics used by the optimizer. (i.e. sometimes when I run this query it won't fail)

Hmm, when I replace the query, the test case doesn't fail for me even when the index usage stats controller isn't being set.

Also, when I try running this on the cli, I get the following error:

ERROR: unimplemented: operation is unsupported in multi-tenancy mode SQLSTATE: 0A000 HINT: You have attempted to use a feature that is not yet implemented. See: https://go.crdb.dev/issue-v/54254/v22.1

Copy link
Copy Markdown
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @lindseyjin)


pkg/sql/sem/builtins/builtins_test.go, line 493 at r1 (raw file):

Previously, lindseyjin (Lindsey Jin) wrote…

Hmm, when I replace the query, the test case doesn't fail for me even when the index usage stats controller isn't being set.

Also, when I try running this on the cli, I get the following error:

ERROR: unimplemented: operation is unsupported in multi-tenancy mode SQLSTATE: 0A000 HINT: You have attempted to use a feature that is not yet implemented. See: https://go.crdb.dev/issue-v/54254/v22.1

Hmm interesting, seems like the new master changed a few things ... let me check

@lindseyjin lindseyjin force-pushed the fix-reset-index-stats branch from 22248b5 to 2fd988c Compare November 4, 2021 16:03
Copy link
Copy Markdown
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @lindseyjin)


pkg/sql/sem/builtins/builtins_test.go, line 493 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Hmm interesting, seems like the new master changed a few things ... let me check

Found the problem:

CREATE TABLE t (k INT PRIMARY KEY); -- <- `k` here should be the primary key
INSERT INTO t SELECT generate_series(1, 30);

ALTER TABLE t SPLIT AT VALUES (10), (20);
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 1, 1;
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 2, 15;
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 3, 25;

SELECT count(*) FROM t WHERE crdb_internal.reset_index_usage_stats();

Actually this was caused by the fact that I didn't make the k a primary key, so the range split was effectively no-op.

Resolves cockroachdb#72254

Previously, the unit test for resetting index usage statistics was
failing. This was happening because of two reasons:
1. The IndexUsageStatsController was not being set properly for DistSQL
server configs.
2. Noise from other tests was sometimes populating the index usage
statistics table right after it was reset, causing the check for a newly
reset table to sometimes randomly fail.

To fix these issues, we have implemented the following:
1. Created a test case to reproduce the first error, and correctly
populated IndexUsageStatsController for DistSQL server configs.
2. Updated the unit test to query for the specific test table id, to
avoid other tables polluting the test space. The test has also been
updated to check for a correct LastReset time.

Release note: none
@lindseyjin lindseyjin force-pushed the fix-reset-index-stats branch from 2fd988c to 40bb260 Compare November 8, 2021 15:51
Copy link
Copy Markdown
Contributor Author

@lindseyjin lindseyjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)


pkg/sql/sem/builtins/builtins_test.go, line 493 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Found the problem:

CREATE TABLE t (k INT PRIMARY KEY); -- <- `k` here should be the primary key
INSERT INTO t SELECT generate_series(1, 30);

ALTER TABLE t SPLIT AT VALUES (10), (20);
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 1, 1;
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 2, 15;
ALTER TABLE t EXPERIMENTAL_RELOCATE LEASE SELECT 3, 25;

SELECT count(*) FROM t WHERE crdb_internal.reset_index_usage_stats();

Actually this was caused by the fact that I didn't make the k a primary key, so the range split was effectively no-op.

Done, thanks!

Copy link
Copy Markdown
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @Azhng)

@lindseyjin
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 8, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ccl/serverccl: TestResetIndexUsageStatsRPCForTenant failed

3 participants