Skip to content

crdb_internal.statement_statistics returning wrong value #85958

@maryliag

Description

@maryliag

When there is the same fingerprint with more than one index recommendations the final result on the combined (memory and persisted) is double counting. Example:

CREATE TABLE t1 (k INT, i INT, f FLOAT, s STRING);
CREATE TABLE t2 (k INT, i INT, s STRING);
SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k WHERE t1.i > 3 AND t2.i > 3; (executed 6 times, flush and 3 more times)
> SELECT                                                                                                                          
aggregated_ts,                                                                                                                                     
fingerprint_id,                                                                                                                                    
transaction_fingerprint_id,                                                                                                                       
 plan_hash,                                                                                                                                        
 app_name,                                                                                                                                          
metadata,                                                                                                                                          
statistics,                                                                                                                                        
sampled_plan,                                                                                                                                      
aggregation_interval,                                                                                                                             
 index_recommendations                                                                                                                           
 FROM                                                                                                                                                
crdb_internal.cluster_statement_statistics WHERE fingerprint_id = '\x23f7891854867892';

-[ RECORD 1 ]
aggregated_ts              | 2022-08-10 20:00:00+00
fingerprint_id             | \x23f7891854867892
transaction_fingerprint_id | \x8c943454d287cf4d
plan_hash                  | \xf0d8ed4a1e2397f3
app_name                   | $ cockroach demo
metadata                   | {"db": "movr", "distsql": false, "failed": false, "fullScan": true, "implicitTxn": true, "query": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k WHERE (t1.i > _) AND (t2.i > _)", "querySummary": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k", "stmtTyp": "TypeDML", "vec": true}
statistics                 | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 9.216E+4, "sqDiff": 0}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": ["creation : CREATE INDEX ON t1 (i) STORING (k);", "creation : CREATE INDEX ON t2 (i) STORING (k);"], 
"statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 3, "firstAttemptCnt": 3, "lastExecAt": "2022-08-10T20:17:04.247211Z", "maxRetries": 0, "nodes": [], "numRows": {"mean": 0, "sqDiff": 0}, "ovhLat": {"mean": 0.000005239000000000018, "sqDiff": 2.2394060000000434E-12}, "parseLat": {"mean": 0.00006279233333333333, "sqDiff": 2.724717286666665E-10}, "planGists": ["AgHgAQIAAwAAAAMB4gECAAMAAAADCQACAgAABQIGAg=="], "planLat": {"mean": 0.0002552356666666667, "sqDiff": 1.0722118544266666E-7}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "runLat": {"mean": 0.0004819166666666667, "sqDiff": 3.793698434666664E-9}, "svcLat": {"mean": 0.0008051836666666668, "sqDiff": 1.5446868870866666E-7}}}
sampled_plan               | {"Children": [{"Children": [{"Children": [], "Estimated Row Count": "1 (100% of the table; stats collected 12 minutes ago)", "Name": "scan", "Spans": "FULL SCAN", "Table": "t1@t1_pkey"}], "Estimated Row Count": "1", "Filter": "i > _", "Name": "filter"}, {"Children": [{"Children": [], "Estimated Row Count": "1 (100% of the table; stats collected 12 minutes ago)", "Name": "scan", "Spans": "FULL SCAN", "Table": "t2@t2_pkey"}], "Estimated Row Count": "1", "Filter": "i > _", "Name": "filter"}], "Equality": "(k) = (k)", "Estimated Row Count": "1", "Name": "hash join"}
aggregation_interval       | 01:00:00
index_recommendations      | {"creation : CREATE INDEX ON t1 (i) STORING (k);","creation : CREATE INDEX ON t2 (i) STORING (k);"}

This has statistics ->> 'cnt' = 3

Checking the same fingerprint on the persisted table

> SELECT                                                                                                                            
aggregated_ts,                                                                                                                                    
 fingerprint_id,                                                                                                                                    
transaction_fingerprint_id,                                                                                                                        
plan_hash,                                                                                                                                         
app_name,                                                                                                                                          
metadata,                                                                                                                                          
statistics,                                                                                                                                        
plan,                                                                                                                                              
agg_interval,                                                                                                                                      
index_recommendations                                                                                                                            
FROM                                                                                                                                                 
system.statement_statistics WHERE fingerprint_id = '\x23f7891854867892';
-[ RECORD 1 ]
aggregated_ts              | 2022-08-10 20:00:00+00
fingerprint_id             | \x23f7891854867892
transaction_fingerprint_id | \x8c943454d287cf4d
plan_hash                  | \xf0d8ed4a1e2397f3
app_name                   | $ cockroach demo
metadata                   | {"db": "movr", "distsql": true, "failed": false, "fullScan": true, "implicitTxn": true, "query": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k WHERE (t1.i > _) AND (t2.i > _)", "querySummary": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k", "stmtTyp": "TypeDML", "vec": true}
statistics                 | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 9.216E+4, "sqDiff": 0}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": ["creation : CREATE INDEX ON t1 (i) STORING (k);", "creation : CREATE INDEX ON t2 (i) STORING (k);"], 
"statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 6, "firstAttemptCnt": 6, "lastExecAt": "2022-08-10T20:04:41.55328Z", "maxRetries": 0, "nodes": [], "numRows": {"mean": 0, "sqDiff": 0}, "ovhLat": {"mean": 0.000005479666666666607, "sqDiff": 1.0600553333329841E-12}, "parseLat": {"mean": 0.00007301849999999999, "sqDiff": 3.789622235000001E-10}, "planGists": ["AgHgAQIAAwAAAAMB4gECAAMAAAADCQACAgAABQIGAg=="], "planLat": {"mean": 0.0010753486666666666, "sqDiff": 0.000027087948344031332}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "runLat": {"mean": 0.0003941843333333333, "sqDiff": 5.4381690773333294E-9}, "svcLat": {"mean": 0.0015480311666666666, "sqDiff": 0.000026748804043984833}}}
plan                       | {"Children": [{"Children": [{"Children": [], "Missing Stats": "", "Name": "scan", "Spans": "FULL SCAN", "Table": "t1@t1_pkey"}], "Filter": "i > _", "Name": "filter"}, {"Children": [{"Children": [], "Missing Stats": "", "Name": "scan", "Spans": "FULL SCAN", "Table": "t2@t2_pkey"}], "Filter": "i > _", "Name": "filter"}], "Equality": "(k) = (k)", "Name": "hash join"}
agg_interval               | 01:00:00
index_recommendations      | {"creation : CREATE INDEX ON t1 (i) STORING (k);","creation : CREATE INDEX ON t2 (i) STORING (k);"}

It has statistics ->> 'cnt' = 6

BUT when I do a group by to join those two into one thing (this is the statement being used on crdb_internal.statement_statistics):

> SELECT
aggregated_ts,
fingerprint_id, 
transaction_fingerprint_id,   
plan_hash,
app_name, 
max(metadata) AS metadata,
crdb_internal.merge_statement_stats(array_agg(statistics)),  
max(sampled_plan), 
aggregation_interval,  
array_remove(array_agg(index_rec), NULL) AS index_recommendations 
  FROM   (  
SELECT  
aggregated_ts,   
fingerprint_id, 
transaction_fingerprint_id,
 plan_hash, 
app_name,  
metadata, 
statistics,  
sampled_plan,     
aggregation_interval, 
index_recommendations 
 FROM crdb_internal.cluster_statement_statistics WHERE fingerprint_id = '\x23f7891854867892' 
UNION ALL   
 SELECT 
aggregated_ts,           
 fingerprint_id,      
transaction_fingerprint_id,   
plan_hash,   
app_name, 
metadata,  
statistics,                                                                                                                                        
plan,                                                                                                                                             
 agg_interval,                                                                                                                                     
 index_recommendations                                                                                                                           
 FROM                                                                                                                                                 
system.statement_statistics WHERE fingerprint_id = '\x23f7891854867892'                                                                      
)                                                                                                                                                  
LEFT JOIN LATERAL unnest(index_recommendations) AS index_rec ON true                                                                             
GROUP BY                                                                                                                                             
aggregated_ts,                                                                                                                                     
fingerprint_id,                                                                                                                                    
transaction_fingerprint_id,                                                                                                                        
plan_hash,                                                                                                                                        
 app_name,                                                                                                                                          
aggregation_interval;
-[ RECORD 1 ]
aggregated_ts                       | 2022-08-10 20:00:00+00
fingerprint_id                      | \x23f7891854867892
transaction_fingerprint_id          | \x8c943454d287cf4d
plan_hash                           | \xf0d8ed4a1e2397f3
app_name                            | $ cockroach demo
metadata                            | {"db": "movr", "distsql": true, "failed": false, "fullScan": true, "implicitTxn": true, "query": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k WHERE (t1.i > _) AND (t2.i > _)", "querySummary": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k", "stmtTyp": "TypeDML", "vec": true}
crdb_internal.merge_statement_stats | {"execution_statistics": {"cnt": 4, "contentionTime": {"mean": 0, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 9.216E+4, "sqDiff": 0}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": ["creation : CREATE INDEX ON t1 (i) STORING (k);", "creation : CREATE INDEX ON t2 (i) STORING (k);"], 
"statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 18, "firstAttemptCnt": 18, "lastExecAt": "2022-08-10T20:17:04.247211Z", "maxRetries": 0, "nodes": [], "numRows": {"mean": 0, "sqDiff": 0}, "ovhLat": {"mean": 0.0000053994444444444104, "sqDiff": 6.830604444443682E-12}, "parseLat": {"mean": 0.00006960977777777777, "sqDiff": 1.721165843111111E-9}, "planGists": ["AgHgAQIAAwAAAAMB4gECAAMAAAADCQACAgAABQIGAg=="], "planLat": {"mean": 0.0008019776666666666, "sqDiff": 0.000057080680390024}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "runLat": {"mean": 0.00042342844444444446, "sqDiff": 4.9251584272444445E-8}, "svcLat": {"mean": 0.0013004153333333332, "sqDiff": 0.000056013835098412006}}}
max                                 | {"Children": [{"Children": [{"Children": [], "Missing Stats": "", "Name": "scan", "Spans": "FULL SCAN", "Table": "t1@t1_pkey"}], "Filter": "i > _", "Name": "filter"}, {"Children": [{"Children": [], "Missing Stats": "", "Name": "scan", "Spans": "FULL SCAN", "Table": "t2@t2_pkey"}], "Filter": "i > _", "Name": "filter"}], "Equality": "(k) = (k)", "Name": "hash join"}
aggregation_interval                | 01:00:00
index_recommendations               | {"creation : CREATE INDEX ON t1 (i) STORING (k);","creation : CREATE INDEX ON t2 (i) STORING (k);"}

I’m getting statistics ->> 'cnt' = 18 instead of the expected 9.

We need a different join that doesn't use the LEFT JOIN LATERAL unnest(index_recommendations) AS index_rec ON true because is joining on itself and double counting.

Jira issue: CRDB-18505

Metadata

Metadata

Assignees

Labels

A-sql-observabilityRelated to observability of the SQL layerC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.GA-blockerbranch-release-22.2Used to mark GA and release blockers, technical advisories, and bugs for 22.2

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions