Skip to content

sql: fix INTERVAL type representation and encoding #84078

@mgartner

Description

@mgartner

There are a number of problems with our INTERVAL type and how it is encoded.

1. Broken key encoding allows duplicate values in primary and unique secondary indexes

In Postgres, the SQL snippet below will result in a duplicate key violation on the second insert. In CockroachDB, both inserts succeed because our key encoding of INTERVAL is not correct.

CREATE TABLE t (i INTERVAL PRIMARY KEY);
INSERT INTO t VALUES ('1 month');
INSERT INTO t VALUES ('30 days');

This is allowed because we incorrectly encode INTERVAL keys:

b = EncodeVarintAscending(b, months)
b = EncodeVarintAscending(b, days)

Instead of normalizing the months and days of an interval, we encode the int64s representing them separately. So, the values '1 month' and '30 days', which are equivalent, have different key encodings.

2. Broken key encoding causes overflow errors

The snippet below will error with the message "overflow during Encode" in CRDB, but is successful in Postgres because 5000 years is within the bounds of min/max INTERVAL values of [-178000000 years, 178000000 years].

CREATE TABLE t (i INTERVAL PRIMARY KEY);
INSERT INTO t VALUES ('5000 years');

Our key encoding tries to convert the months and years of a Duration to nanoseconds, but this calculation can overflow an int64 with INTERVAL values well within the upper and lower bounds of the type.

This error can occur when adding an index on an INTERVAL column too:

CREATE TABLE t0 (c0 INTERVAL);
INSERT INTO t0 (c0) VALUES((INTERVAL '100000000 year'));
CREATE INDEX ON t0(c0); --ERROR: failed to construct index entries during backfill: overflow during Encode;

3. INTERVAL has nanosecond precision, even though Postgres only supports microsecond precision

There is no need for our Duration type to work with nanosecond-level precision. The INTERVAL type only needs microsecond-level precision. Notice the comments left on the Duration struct:

//
// Although the Nanos field is a number of nanoseconds, all operations
// round to the nearest microsecond. Any setting of this field should avoid
// setting with precision below microseconds. The only exceptions are the
// encode/decode operations.
//
// TODO(dan): Until the overflow and underflow handling is fixed, this is only
// useful for durations of < 292 years.
type Duration struct {
Months int64
Days int64
// nanos is an unexported field so that it cannot be misused by other
// packages. It should almost always be rounded to the nearest microsecond.
nanos int64
}

Postgres represents an INTERVAL in 16 bytes: two int32s for month and day, and one int64 for microsecond offset: https://github.com/postgres/postgres/blob/d498e052b4b84ae21b3b68d5b3fda6ead65d1d4d/src/include/datatype/timestamp.h#L21-L53

Related INTERVAL Issues

Jira issue: CRDB-17449

Epic: CRDB-20062

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-sql-datatypesSQL column types usable in table descriptors.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-sql-queriesSQL Queries Teambranch-masterFailures and bugs on the master branch.

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions