Skip to content

opt: histogram upper bounds should not be outside of min/max values for a type #76887

@mgartner

Description

@mgartner

I've observed INT2 and INT4 column statistics in a statement bundle with histogram upper bounds that were greater than the max INT2 or INT4, and less than the minimum INT2 or INT4. For example:

{
    "columns": [
        "a"
    ],
    "created_at": "2022-01-01 12:34:56.000000",
    "distinct_count": 18,
    "histo_buckets": [
        {
            "distinct_range": 0,
            "num_eq": 0,
            "num_range": 0,
            "upper_bound": "-9223372036854775808"
        },
        {
            "distinct_range": 4.5,
            "num_eq": 14212895,
            "num_range": 4,
            "upper_bound": "0"
        },
        {
            "distinct_range": 7,
            "num_eq": 6249,
            "num_range": 17024855,
            "upper_bound": "8"
        },
        {
            "distinct_range": 4.5,
            "num_eq": 0,
            "num_range": 4,
            "upper_bound": "9223372036854775807"
        }
    ],
    "histo_col_type": "INT2",
    "histo_version": 1,
    "name": "__auto__",
    "null_count": 0,
    "row_count": 31244007
},

This is problematic because it may lead to incorrect row count estimates, and injecting these stats manually results in an error:

defaultdb> CREATE TABLE t (a INT2);
CREATE TABLE

defaultdb> ALTER TABLE t INJECT STATISTICS '[
{
    "columns": [
        "a"
    ],
    "created_at": "2022-01-01 12:34:56.000000",
    "distinct_count": 18,
    "histo_buckets": [
        {
            "distinct_range": 0,
            "num_eq": 0,
            "num_range": 0,
            "upper_bound": "-9223372036854775808"
        },
        {
            "distinct_range": 4.5,
            "num_eq": 14212895,
            "num_range": 4,
            "upper_bound": "0"
        },
        {
            "distinct_range": 7,
            "num_eq": 6249,
            "num_range": 17024855,
            "upper_bound": "8"
        },
        {
            "distinct_range": 4.5,
            "num_eq": 0,
            "num_range": 4,
            "upper_bound": "9223372036854775807"
        }
    ],
    "histo_col_type": "INT2",
    "histo_version": 1,
    "name": "__auto__",
    "null_count": 0,
    "row_count": 31244007
}]';
ERROR: integer out of range for type int2
SQLSTATE: 22003

I'm not sure what caused this and I've been unable to reproduce it so far. The statement bundle was from CRDB v21.2.5.

Jira issue: CRDB-13311

Epic CRDB-16930

Metadata

Metadata

Assignees

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.C-cleanupTech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.T-sql-queriesSQL Queries Team

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions