sql: generate index recommendations by maryliag · Pull Request #85343 · cockroachdb/cockroach

maryliag · 2022-07-29T20:02:00Z

We want to collect index recommendations
per statement and save it to the statement_statistics table.
To accomplish this, a new cache map was created.
The key for the map is a combination of the
statement fingerprint, the database and the plan hash,
since those are the parameters that would affect
an index recommendation (from the parameters we use
as keys for the statement fingerprint ID at least).

This cache has a limit of unique recommendations,
limited by a new cluster setting called
sql.metrics.statement_details.max_mem_reported_idx_recommendations
with a default value of 100k.
If this limit is reached, we delete entries that
had their last update made more than 24h ago.

A new recommendations is generated if has been more
than 1h since the last recommendation (saved on cache)
and it has at least a few executions (so we handle the case
where we have just one execution of several different
fingerprints, and is not worth calculating recommendations
for them).

The recommendations are saved as a list, with each
recommendations being "type : sql command", e.g.
{"creation : CREATE INDEX ON t2 (i) STORING (k);"}
and
{"replacement : CREATE UNIQUE INDEX ON t1 (i) STORING (k); DROP INDEX t1@existing_t1_i;"}

Fixes #83782

Different approaches tested:

branch	Cache	Min Execution Count	Clear Cache
idx-rec-cache-cnt	Yes	1	Older than 24h
idx-rec-large-cnt	Yes	10	Older than 24h
idx-rec-random	Yes	1	Random 1000
idx-rec-no-cnt	Yes	0	Older than 24h
idx-rec-no-cache	No	N/A	N/A

Min Execution Count is the minimum value of execution the statement must have so we decide to generate recommendation

Running TPC C on all branches, results:

branch	elapsed	tpmC	efc	avg(ms)	p50(ms)	p90(ms)	p95(ms)	p99(ms)	pMax(ms)
main	4200.1s	12.4	96.8%	19.8	18.9	25.2	31.5	46.1	302.0
idx-rec-cache-cnt	3941.0s	12.6	98.3%	18.8	18.9	23.1	25.2	37.7	52.4
idx-rec-large-cnt	4138.8s	12.3	95.7%	18.0	17.8	22.0	24.1	39.8	130.0
idx-rec-random	4321.9s	12.6	98.1%	19.3	18.9	24.1	27.3	39.8	209.7
idx-rec-no-cnt	4200.2s	12.4	96.4%	19.3	18.9	24.1	26.2	44.0	67.1
idx-rec-no-cache	4199.9s	12.5	97.4%	31.1	27.3	52.4	60.8	83.9	151.0

The most significant change is when there is no cache, confirming the cache is necessary.
The decision on keeping or removing the execution count check wasn't clear, so I decided to keep the count with a value of 5 and run another workload.

Running YSCB A:

branch	elapsed	ops(total)	ops/sec(cum)	avg(ms)	p50(ms)	p95(ms)	p99(ms)	pMax(ms)
main	4200.1s	33334570	7936.6	4.0	2.9	11.0	18.9	436.2
idx-rec-cache-cnt	4012.4s	20153193	5022.7	6.4	3.8	19.9	37.7	1879.0
idx-rec-no-cnt	4200.1s	29112702	6931.4	4.6	3.1	13.1	23.1	1342.2
idx-rec-no-cache	4200.1s	27216030	6479.8	4.9	3.4	13.6	23.1	872.4

Keeping the execution count has a lower performance when there is more contention, but since we have customers that generate several similar fingerprints by using different columns order on selects etc, we want to avoid creating recommendations for those cases, since it would be almost as having no cache.
So I created another option (branch idx-rec-cache), where we do have a count, but once the count reaches the min count, we don't need to update it anymore (helping with contention on the cache itself).
Running workloads with those options again:

TPC C

branch	elapsed	tpmC	efc	avg(ms)	p50(ms)	p90(ms)	p95(ms)	p99(ms)	pMax(ms)
main	4200.1s	12.6	98.0%	19.0	18.9	22.0	23.1	27.3	62.9
idx-rec-cache	4200.0s	12.5	97.0%	19.3	19.9	23.1	24.1	31.5	62.9
idx-rec-cnt	4200.1s	12.6	97.9%	20.0	19.9	24.1	26.2	39.8	79.7
idx-rec-no-cache	4200.3s	12.3	96.0%	26.3	26.2	29.4	31.5	46.1	75.5

YSCB A

branch	elapsed	ops(total)	ops/sec(cum)	avg(ms)	p50(ms)	p95(ms)	p99(ms)	pMax(ms)
main	4200.0s	27374582	6517.8	4.9	3.4	13.1	28.3	872.4
idx-rec-cache	4200.0s	26575359	6327.4	5.1	3.5	13.6	29.4	906.0
idx-rec-cnt	4199.9s	27459203	6538.1	4.9	3.4	13.1	28.3	704.6
idx-rec-no-cache	4199.9s	24826078	5911.1	5.4	3.7	14.2	30.4	738.2

Release note (sql change): Index recommendations are generated
for statements. New recommendations are generated every hour and
available on system.statement_statistics and
crdb_internal.statement_statistics.
New cluster setting with default value of 100k
sql.metrics.statement_details.max_mem_reported_idx_recommendations

cockroach-teamcity · 2022-07-29T20:02:07Z

This change is

ajwerner

This is a drive-by review. Consider spelling out the interface for the cache a bit more clearly up-front in the sql package and then perhaps write some testing with a simpler implementation that, for example, just has a big ol' mutex on top of it and then iterate to a more optimized version. Generally best to write some micro-benchmarks with some concurrency to ensure that the optimizations are worth it.