-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql: expose MVCC timestamps to SQL #50102
Description
Is your feature request related to a problem? Please describe.
Cockroach associates each row in its KV store with an HLC timestamp for the purpose of MVCC. Currently this information is not accessible to client applications. Making this data accessible would be valuable to allow client applications to detect when data changed. We know for a fact that this has been a customer request on more than one occasion.
It also loosely relates to #39275 as using these timestamp might make detecting differences between rows easier. It seems like this task would be something of a requirement to implementing #19749.
Describe the solution you'd like
I'd like to see cockroach expose a system column that a sql query can ask for to retrieve the MVCC timestamp of a row. Postgres offers such system columns (see https://www.postgresql.org/docs/current/ddl-system-columns.html) that provide introspection into its concurrency control.
One wrinkle in all of this is that different column families in a row can have different MVCC timestamps. Perhaps the name of the column will need to somehow refer to the name of the column family. Perhaps then we'd need some way to refer to the upper bound of all of the column families.
Another related problem has to do with retrieving this column from indexes. Only in some cases will we know that the write timestamp on an index will be the same as the timestamp on a secondary index. Figuring out when the timestamp off of the secondary index can be used vs. needing to go back and consult the primary index may also be something of a complex problem. I suspect for many cases it's not a big deal -- the need to do an index join back on the primary index when using this feature would likely be acceptable for v0.
Describe alternatives you've considered
Not much. Perhaps some sort of auto-updating column that updates for every write to the row?
Triggers could relieve the need for this in some cases but it's not ideal.
Additional context
The changes required for this should be pretty isolated. The row-fetcher already always retrieves the MVCC timestamps in get and scan requests. It's a matter of telling the row-fetcher to decode it and use it and to plan that retrieval of these columns. This feels like a squarely SQL features project and might even make for a good intern project.
The remaining complication as far as I'm concerned is that today schema change backfills will update rows and their MVCC timestamps. The solution to this will be to construct new indexes and swapping over (#47989). This issue shouldn't necessarily precede the work to expose MVCC columns but it has major bearing on their semantics. I suspect the work should proceed in parallel.