You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's very common in real-world database schemas to have reference tables - infrequently-updated tables, often relatively small, that are used as foreign key tables in applications. Some examples:
a table of zipcodes. zipcodes are very infrequently deleted or added
the items table from tpcc. items in tpcc are never updated
a table of historical closing prices in a financial database. this table is updated once a day, but not during the day
These kinds of tables present some issues for CockroachDB. In CockroachDB, doing a read from a table requires going to the table's leaseholder, to make sure that the reads are consistent. In an application that plays to CockroachDB's strengths that has many, globally distributed datacenters, reference tables like this will cause poor performance in several situations:
inserts into tables that have foreign keys to a reference table will be at least as slow as a read to the leaseholder of the reference table, which might live across the globe
joins from tables to a reference table will require going to the leaseholder, which might live across the globe
This may be unacceptable for customers who desire low-latency reads against reference tables and are willing to sacrifice write performance against those tables.
We can support this in CockroachDB by providing users the ability to mark a table as immutable, and using that information to provide fast, transactional reads to any replica.
Proposed solution for CockroachDB: Immutable tables
Immutable tables are tables that cannot be modified. Tables can be set as immutable, and reset back to mutable, by performing a schema change with an ordinary ALTER TABLE command.
When a table is marked as immutable, it's possible to drastically improve read performance from it by altering the read semantics. There are (at least) two ways of doing this:
Using inconsistent reads against the table. That's safe because we know that the table is immutable, implying that all of its replicas are up-to-date.
Using follower reads (read from a follower with timestamp bound #16593) against the table, with a read timestamp somewhere in the past, divorced from the active transaction timestamp. That's safe because follower reads guarantee that follower replicas are up-to-date as of some interval of time in the past.
The first solution is harder to implement, because of the semantics of leaving the immutable state - we'd have to invent a way to wait until data has fully replicated to all replicas. It also might be unsafe altogether - see @nvanbenschoten's comment on this issue.
The second solution is easier, and should be possible as soon as follower reads are available, using the schema change infrastructure and an intermediate state similar to the intermediate WRITE_ONLY state used for index backfills. The rest of this issue proposes to use the second solution.
Details on immutable/mutable states and state change process
Table descriptors will be extended with a new field, mutability, that has three possible values:
mutable
Reads from a mutable table behave as normal.
Writes to a mutable table are permitted.
immutable-must-read-transactional
Reads from an immutable-must-read-transactional table behave as normal.
Writes to an immutable-must-read-transactional table are not permitted.
immutable
Reads from an immutable table are performed at a historical timestamp, t seconds in the past (where t is defined by the follower read safe past interval), and get served by a follower read
Writes to an immutable table are not permitted
By default, mutability is set to mutable. To transition a table from mutable to immutable, a user issues an ALTER TABLE t SET IMMUTABLE command (or similar - syntax to be decided later), which performs the following actions:
Set table descriptor to immutable-must-read-transactional and wait until the cluster converges on the new descriptor.
Wait t seconds to ensure that, once transitioned to immutable, all reads from t seconds ago occur before the last possible write to the reference table.
Set table descriptor to immutable.
To transition a table from immutable to mutable, a user issues an ALTER TABLE t SET MUTABLE command, which performs the following actions:
Set table descriptor to immutable-must-read-transactional and wait until the cluster converges on the new descriptor.
Set table descriptor to mutable. (note we don't have to wait t seconds here, since we're moving in a "safe" direction, causing all future reads to be ordinary transactional reads.)
Background
It's very common in real-world database schemas to have reference tables - infrequently-updated tables, often relatively small, that are used as foreign key tables in applications. Some examples:
itemstable fromtpcc. items in tpcc are never updatedThese kinds of tables present some issues for CockroachDB. In CockroachDB, doing a read from a table requires going to the table's leaseholder, to make sure that the reads are consistent. In an application that plays to CockroachDB's strengths that has many, globally distributed datacenters, reference tables like this will cause poor performance in several situations:
This may be unacceptable for customers who desire low-latency reads against reference tables and are willing to sacrifice write performance against those tables.
We can support this in CockroachDB by providing users the ability to mark a table as immutable, and using that information to provide fast, transactional reads to any replica.
Proposed solution for CockroachDB: Immutable tables
Immutable tables are tables that cannot be modified. Tables can be set as immutable, and reset back to mutable, by performing a schema change with an ordinary
ALTER TABLEcommand.When a table is marked as immutable, it's possible to drastically improve read performance from it by altering the read semantics. There are (at least) two ways of doing this:
That's safe because we know that the table is immutable, implying that all of its replicas are up-to-date.The first solution is harder to implement, because of the semantics of leaving the immutable state - we'd have to invent a way to wait until data has fully replicated to all replicas. It also might be unsafe altogether - see @nvanbenschoten's comment on this issue.
The second solution is easier, and should be possible as soon as follower reads are available, using the schema change infrastructure and an intermediate state similar to the intermediate WRITE_ONLY state used for index backfills. The rest of this issue proposes to use the second solution.
Details on immutable/mutable states and state change process
Table descriptors will be extended with a new field,
mutability, that has three possible values:mutablemutabletable behave as normal.mutabletable are permitted.immutable-must-read-transactionalimmutable-must-read-transactionaltable behave as normal.immutable-must-read-transactionaltable are not permitted.immutableimmutabletable are performed at a historical timestamp,tseconds in the past (wheretis defined by the follower read safe past interval), and get served by a follower readimmutabletable are not permittedBy default,
mutabilityis set tomutable. To transition a table frommutabletoimmutable, a user issues anALTER TABLE t SET IMMUTABLEcommand (or similar - syntax to be decided later), which performs the following actions:immutable-must-read-transactionaland wait until the cluster converges on the new descriptor.tseconds to ensure that, once transitioned toimmutable, all reads fromtseconds ago occur before the last possible write to the reference table.immutable.To transition a table from
immutabletomutable, a user issues anALTER TABLE t SET MUTABLEcommand, which performs the following actions:immutable-must-read-transactionaland wait until the cluster converges on the new descriptor.mutable. (note we don't have to waittseconds here, since we're moving in a "safe" direction, causing all future reads to be ordinary transactional reads.)