builtins: add timestamp to crdb_internal.scan/list_sql_keys_in_range#90956
builtins: add timestamp to crdb_internal.scan/list_sql_keys_in_range#90956craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
stevendanna
left a comment
There was a problem hiding this comment.
You probably need to alter the rangeKeyIterator code. I think you probably need to change this:
// Values implements the tree.ValueGenerator interface.
func (rk *rangeKeyIterator) Values() (tree.Datums, error) {
kv := rk.kvs[rk.index]
rk.buf[0] = tree.NewDString(kv.Key.String())
rk.buf[1] = tree.NewDString(kv.Value.PrettyPrint())
return rk.buf[:], nil
}to make sure it only returns the first two elements of the buffer. (or change the return type of that function as well.
This change adds a third return column to the builtins that corresponds to the timestamp at which the value for a particular key was written. Release note (sql change): `crdb_internal.scan` and `crdb_internal.list_sql_keys_in_range` return the timestamp at which the value for a key was written, in addition to the raw key and value.
f3e22e6 to
0af0ea9
Compare
heh, you're too quick! Decided to add the ts to that builtin too, let me know if you'd rather I go the other way. |
crdb_internal.scan
stevendanna
left a comment
There was a problem hiding this comment.
I reviewed the uses of sql_keys_in_range in Slack and didn't see anyone using it in a way that would be seriously impacted by the new column, so I think adding this new column there is OK by me.
Left one question re the return type that I'll leave to your judgement.
| // prettified versions of the key and value. | ||
| []*types.T{types.String, types.String}, | ||
| []string{"key", "value"}, | ||
| []*types.T{types.String, types.String, types.String}, |
There was a problem hiding this comment.
I wonder: Should we make this new column types.Decimal? That is the type of crdb_internal_mvcc_timestamp column is. I could go either way..
There was a problem hiding this comment.
In https://github.com/cockroachdb/cockroach/pull/90848/files#diff-ed2b0d994e60d34e9f853343fd43ee6e14ca234027786774a143e5b73b5a2698R111 im currently hashing the hlc.Timestamp.String() representation of an MVCC timestamp. As discussed offline I wanted uniformity between the output of this builtin and what we hash so that in the future we can write statements such as:
WITH sfp AS (SELECT xor_agg(fnv64(key, ts::BYTES, value)) AS scanfp from crdb_internal.scan(crdb_internal.table_span('t'::regclass::int)))
SELECT count(*) FROM crdb_internal.fingerprint(crdb_internal.table_span('t'::regclass::int),'$before_insert'::TIMESTAMPTZ, false) AS efp, sfp WHERE efp = sfp.scanfp;
to test that our fingerprinting is working as expected. We could go either way here:
- hash the decimal representation of the timestamp and change scan to return a decimal instead
- hash the string representation of the timestamp and keep hashing behaviour what it is at the moment
In the interest of moving things ahead, I'll leave this as a string for now but while reviewing #90848 if we decide decimal is a better representation in both places, I'll make the switch there.
|
TFTR! bors r=stevendanna |
|
Build succeeded: |
This change adds a third return column to the builtins
that corresponds to the timestamp at which the value for
a particular key was written.
Release note (sql change):
crdb_internal.scanandcrdb_internal.list_sql_keys_in_rangereturnthe timestamp at which the value for a key was written, in
addition to the raw key and value.