Skip to content

sql: implement immutable tables that automatically use follower reads #26301

@jordanlewis

Description

@jordanlewis

Background

It's very common in real-world database schemas to have reference tables - infrequently-updated tables, often relatively small, that are used as foreign key tables in applications. Some examples:

  • a table of zipcodes. zipcodes are very infrequently deleted or added
  • the items table from tpcc. items in tpcc are never updated
  • a table of historical closing prices in a financial database. this table is updated once a day, but not during the day

These kinds of tables present some issues for CockroachDB. In CockroachDB, doing a read from a table requires going to the table's leaseholder, to make sure that the reads are consistent. In an application that plays to CockroachDB's strengths that has many, globally distributed datacenters, reference tables like this will cause poor performance in several situations:

  • inserts into tables that have foreign keys to a reference table will be at least as slow as a read to the leaseholder of the reference table, which might live across the globe
  • joins from tables to a reference table will require going to the leaseholder, which might live across the globe

This may be unacceptable for customers who desire low-latency reads against reference tables and are willing to sacrifice write performance against those tables.

We can support this in CockroachDB by providing users the ability to mark a table as immutable, and using that information to provide fast, transactional reads to any replica.

Proposed solution for CockroachDB: Immutable tables

Immutable tables are tables that cannot be modified. Tables can be set as immutable, and reset back to mutable, by performing a schema change with an ordinary ALTER TABLE command.

When a table is marked as immutable, it's possible to drastically improve read performance from it by altering the read semantics. There are (at least) two ways of doing this:

  1. Using inconsistent reads against the table. That's safe because we know that the table is immutable, implying that all of its replicas are up-to-date.
  2. Using follower reads (read from a follower with timestamp bound #16593) against the table, with a read timestamp somewhere in the past, divorced from the active transaction timestamp. That's safe because follower reads guarantee that follower replicas are up-to-date as of some interval of time in the past.

The first solution is harder to implement, because of the semantics of leaving the immutable state - we'd have to invent a way to wait until data has fully replicated to all replicas. It also might be unsafe altogether - see @nvanbenschoten's comment on this issue.

The second solution is easier, and should be possible as soon as follower reads are available, using the schema change infrastructure and an intermediate state similar to the intermediate WRITE_ONLY state used for index backfills. The rest of this issue proposes to use the second solution.

Details on immutable/mutable states and state change process

Table descriptors will be extended with a new field, mutability, that has three possible values:

  • mutable
    • Reads from a mutable table behave as normal.
    • Writes to a mutable table are permitted.
  • immutable-must-read-transactional
    • Reads from an immutable-must-read-transactional table behave as normal.
    • Writes to an immutable-must-read-transactional table are not permitted.
  • immutable
    • Reads from an immutable table are performed at a historical timestamp, t seconds in the past (where t is defined by the follower read safe past interval), and get served by a follower read
    • Writes to an immutable table are not permitted

By default, mutability is set to mutable. To transition a table from mutable to immutable, a user issues an ALTER TABLE t SET IMMUTABLE command (or similar - syntax to be decided later), which performs the following actions:

  1. Set table descriptor to immutable-must-read-transactional and wait until the cluster converges on the new descriptor.
  2. Wait t seconds to ensure that, once transitioned to immutable, all reads from t seconds ago occur before the last possible write to the reference table.
  3. Set table descriptor to immutable.

To transition a table from immutable to mutable, a user issues an ALTER TABLE t SET MUTABLE command, which performs the following actions:

  1. Set table descriptor to immutable-must-read-transactional and wait until the cluster converges on the new descriptor.
  2. Set table descriptor to mutable. (note we don't have to wait t seconds here, since we're moving in a "safe" direction, causing all future reads to be ordinary transactional reads.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-sql-mutationsMutation statements: UPDATE/INSERT/UPSERT/DELETE.C-performancePerf of queries or internals. Solution not expected to change functional behavior.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions