There are a number of problems with our INTERVAL type and how it is encoded.
1. Broken key encoding allows duplicate values in primary and unique secondary indexes
In Postgres, the SQL snippet below will result in a duplicate key violation on the second insert. In CockroachDB, both inserts succeed because our key encoding of INTERVAL is not correct.
CREATE TABLE t (i INTERVAL PRIMARY KEY);
INSERT INTO t VALUES ('1 month');
INSERT INTO t VALUES ('30 days');
This is allowed because we incorrectly encode INTERVAL keys:
|
b = EncodeVarintAscending(b, months) |
|
b = EncodeVarintAscending(b, days) |
Instead of normalizing the months and days of an interval, we encode the int64s representing them separately. So, the values '1 month' and '30 days', which are equivalent, have different key encodings.
2. Broken key encoding causes overflow errors
The snippet below will error with the message "overflow during Encode" in CRDB, but is successful in Postgres because 5000 years is within the bounds of min/max INTERVAL values of [-178000000 years, 178000000 years].
CREATE TABLE t (i INTERVAL PRIMARY KEY);
INSERT INTO t VALUES ('5000 years');
Our key encoding tries to convert the months and years of a Duration to nanoseconds, but this calculation can overflow an int64 with INTERVAL values well within the upper and lower bounds of the type.
This error can occur when adding an index on an INTERVAL column too:
CREATE TABLE t0 (c0 INTERVAL);
INSERT INTO t0 (c0) VALUES((INTERVAL '100000000 year'));
CREATE INDEX ON t0(c0); --ERROR: failed to construct index entries during backfill: overflow during Encode;
3. INTERVAL has nanosecond precision, even though Postgres only supports microsecond precision
There is no need for our Duration type to work with nanosecond-level precision. The INTERVAL type only needs microsecond-level precision. Notice the comments left on the Duration struct:
|
// |
|
// Although the Nanos field is a number of nanoseconds, all operations |
|
// round to the nearest microsecond. Any setting of this field should avoid |
|
// setting with precision below microseconds. The only exceptions are the |
|
// encode/decode operations. |
|
// |
|
// TODO(dan): Until the overflow and underflow handling is fixed, this is only |
|
// useful for durations of < 292 years. |
|
type Duration struct { |
|
Months int64 |
|
Days int64 |
|
// nanos is an unexported field so that it cannot be misused by other |
|
// packages. It should almost always be rounded to the nearest microsecond. |
|
nanos int64 |
|
} |
Postgres represents an INTERVAL in 16 bytes: two int32s for month and day, and one int64 for microsecond offset: https://github.com/postgres/postgres/blob/d498e052b4b84ae21b3b68d5b3fda6ead65d1d4d/src/include/datatype/timestamp.h#L21-L53
Related INTERVAL Issues
Jira issue: CRDB-17449
Epic: CRDB-20062
There are a number of problems with our
INTERVALtype and how it is encoded.1. Broken key encoding allows duplicate values in primary and unique secondary indexes
In Postgres, the SQL snippet below will result in a duplicate key violation on the second insert. In CockroachDB, both inserts succeed because our key encoding of
INTERVALis not correct.This is allowed because we incorrectly encode
INTERVALkeys:cockroach/pkg/util/encoding/encoding.go
Lines 1384 to 1385 in 3ef1dbd
Instead of normalizing the months and days of an interval, we encode the
int64s representing them separately. So, the values'1 month'and'30 days', which are equivalent, have different key encodings.2. Broken key encoding causes overflow errors
The snippet below will error with the message "overflow during Encode" in CRDB, but is successful in Postgres because
5000 yearsis within the bounds of min/maxINTERVALvalues of[-178000000 years, 178000000 years].Our key encoding tries to convert the months and years of a
Durationto nanoseconds, but this calculation can overflow anint64withINTERVALvalues well within the upper and lower bounds of the type.This error can occur when adding an index on an
INTERVALcolumn too:3.
INTERVALhas nanosecond precision, even though Postgres only supports microsecond precisionThere is no need for our
Durationtype to work with nanosecond-level precision. TheINTERVALtype only needs microsecond-level precision. Notice the comments left on theDurationstruct:cockroach/pkg/util/duration/duration.go
Lines 83 to 97 in de8f055
Postgres represents an
INTERVALin 16 bytes: twoint32s for month and day, and oneint64for microsecond offset: https://github.com/postgres/postgres/blob/d498e052b4b84ae21b3b68d5b3fda6ead65d1d4d/src/include/datatype/timestamp.h#L21-L53Related
INTERVALIssuesJira issue: CRDB-17449
Epic: CRDB-20062