Why we use unique partial indexes
We use a lot of unique partial indexes, like this one:
CREATE UNIQUE INDEX ON omicron.public.silo (
name
) WHERE
time_deleted IS NULL;
The index is unique because we don't want to allow two Silos to have the same name. The index is a partial index because of soft deletes. The index should only include non-deleted Silos (time_deleted IS NULL). This is important because otherwise if someone deleted a Silo, the index would (erroneously) prevent us from ever using its name again. RFD 192 talks about all this.
ON CONFLICT
There are times when you want to INSERT a row into a table, and if the INSERT would violate a uniqueness constraint, then you want to ignore the error. For example, consider this index on silo_group:
CREATE UNIQUE INDEX ON omicron.public.silo_group (
silo_id,
external_id
) WHERE
time_deleted IS NULL;
When a user successfully logs into the console using an identity provider (IdP), the IdP may provide us a list of groups that the user is in. That's the first time we may learn about these groups. We want to create records in silo_group for any groups that do not already exist. A good way to do this is INSERT INTO silo_group (...) VALUES (...) ON CONFLICT ... DO NOTHING. This causes the database to create the record if it doesn't exist, but not fail the query (or, more importantly in this case, the surrounding transaction) if the record already exists.
(This is much better than, say, trying to SELECT the silo_group first and then writing an INSERT only if you don't find it. If you don't do this in one transaction, there's a race: you might not find the silo_group, but it's present when you go to insert it. Then your query fails spuriously sometimes. If you do do this in a transaction, then it has all the problems of interactive transactions in RFD 192, plus we're doing two queries instead of one to do the same work. It's clearly much cleaner and more efficient to just issue one INSERT ... ON CONFLICT ... DO NOTHING.)
CockroachDB support for ON CONFLICT
While looking into this, I found a few Cockroachdb issues and PRs related to this:
- support for ON CONFLICT with unique partial indexes: issue, pr. Introduced in 22.1 (newer than what we're using right now). This requires the syntax
ON CONFLICT (columns...) WHERE partial_index_where_clause....
- support for ON CONFLICT ON CONSTRAINT: issue, pr. This turns out not to work for partial indexes.
CockroachDB appears to support:
UPSERT, but "UPSERT does not let you specify the column(s) with the unique constraint; it always uses the column(s) from the primary key" which is not what we want here.
INSERT ... ON CONFLICT DO NOTHING: seems like it will do the right thing
INSERT ... ON CONFLICT (columns) WHERE ...: works fine
INSERT ... ON CONFLICT (columns): does not support partial index
INSERT ... ON CONFLICT ON CONSTRAINT constraint_name: empirically does not support partial indexes
Here are the failure modes for the last two of these (thanks @jmpesp):
Details
root@:26257/omicron> EXPLAIN INSERT INTO "silo_group" ("id", "time_created", "time_modified", "silo_id", "external_id") VALUES ('afb83fac-003b-4eb9-8341-2a644ae8a54f', '2022-08-04T16:09:08.310525287Z', '2022-08-04T16:09:08.310525287Z', 'b50a7760-dc41-48f8-95c8-6e75196ef21c', 'a-group') ON CONFLICT ("silo_id", "external_id") DO NOTHING RETURNING "silo_group"."id", "silo_group"."time_created", "silo_group"."time_modified", "silo_group"."silo_id", "silo_group"."external_id";
ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification
That's despite having this index:
CREATE UNIQUE INDEX ON omicron.public.silo_group (
silo_id,
external_id
) WHERE
time_deleted IS NULL;
I used SHOW CONSTRAINTS FROM silo_group to find the constraint name implied by the unique partial index and @jmpesp tried it on 22.1 but got:
root@:26257/omicron> EXPLAIN INSERT INTO "silo_group" ("id", "time_created", "time_modified", "silo_id", "external_id") VALUES ('afb83fac-003b-4eb9-8341-2a644ae8a54f', '2022-08-04T16:09:08.310525287Z', '2022-08-04T16:09:08.310525287Z', 'b50a7760-dc41-48f8-95c8-6e75196ef21c', 'a-group') ON CONFLICT ON CONSTRAINT "silo_group_silo_id_external_id_key" DO NOTHING RETURNING "silo_group"."id", "silo_group"."time_created", "silo_group"."time_modified", "silo_group"."silo_id", "silo_group"."external_id";
ERROR: unique constraint "silo_group_silo_id_external_id_key" for table "silo_group" is partial, so it cannot be used as an arbiter via the ON CONSTRAINT syntax
SQLSTATE: 42809
HINT: use the ON CONFLICT (columns...) WHERE <predicate> form to select this partial unique constraint as an arbiter
Diesel support for ON CONFLICT
Diesel supports:
Summary of options
| SQL syntax |
Does what we want? |
CockroachDB support? |
Supports partial indexes? |
Diesel support? |
UPSERT |
No - PK only |
Yes |
No - PK only |
No* |
INSERT ... ON CONFLICT DO NOTHING |
Yes? |
Yes |
Yes? |
Yes |
INSERT ... ON CONFLICT (columns) DO NOTHING |
Yes? |
Yes |
No |
Yes |
| `INSERT ... ON CONFLICT (columns) WHERE ... DO NOTHING |
Yes |
Yes |
Yes |
No* |
| `INSERT ... ON CONFLICT ON CONSTRAINT constraint_name DO NOTHING |
Yes? |
Yes |
No |
Yes |
*: you can always write raw SQL with Diesel
This is a little disappointing: the intersection of syntaxes that you can build with Diesel (without raw SQL), that CockroachDB supports with partial indexes, and that does approximately what we want is just ON CONFLICT DO NOTHING. This unfortunately overrides all conflicts, which is less precise than ideal. (For example, when we insert a new Silo into the silo table, we may be okay ignoring a conflict on id (if we generated that id and we're retrying an operation that may have already succeeded), but we wouldn't want to ignore a conflict on name without an error.)
Why we use unique partial indexes
We use a lot of unique partial indexes, like this one:
The index is unique because we don't want to allow two Silos to have the same name. The index is a partial index because of soft deletes. The index should only include non-deleted Silos (
time_deleted IS NULL). This is important because otherwise if someone deleted a Silo, the index would (erroneously) prevent us from ever using its name again. RFD 192 talks about all this.ON CONFLICTThere are times when you want to INSERT a row into a table, and if the INSERT would violate a uniqueness constraint, then you want to ignore the error. For example, consider this index on
silo_group:When a user successfully logs into the console using an identity provider (IdP), the IdP may provide us a list of groups that the user is in. That's the first time we may learn about these groups. We want to create records in
silo_groupfor any groups that do not already exist. A good way to do this isINSERT INTO silo_group (...) VALUES (...) ON CONFLICT ... DO NOTHING. This causes the database to create the record if it doesn't exist, but not fail the query (or, more importantly in this case, the surrounding transaction) if the record already exists.(This is much better than, say, trying to SELECT the silo_group first and then writing an INSERT only if you don't find it. If you don't do this in one transaction, there's a race: you might not find the silo_group, but it's present when you go to insert it. Then your query fails spuriously sometimes. If you do do this in a transaction, then it has all the problems of interactive transactions in RFD 192, plus we're doing two queries instead of one to do the same work. It's clearly much cleaner and more efficient to just issue one
INSERT ... ON CONFLICT ... DO NOTHING.)CockroachDB support for
ON CONFLICTWhile looking into this, I found a few Cockroachdb issues and PRs related to this:
ON CONFLICT (columns...) WHERE partial_index_where_clause....CockroachDB appears to support:
UPSERT, but "UPSERT does not let you specify the column(s) with the unique constraint; it always uses the column(s) from the primary key" which is not what we want here.INSERT ... ON CONFLICT DO NOTHING: seems like it will do the right thingINSERT ... ON CONFLICT (columns) WHERE ...: works fineINSERT ... ON CONFLICT (columns): does not support partial indexINSERT ... ON CONFLICT ON CONSTRAINT constraint_name: empirically does not support partial indexesHere are the failure modes for the last two of these (thanks @jmpesp):
Details
That's despite having this index:
I used
SHOW CONSTRAINTS FROM silo_groupto find the constraint name implied by the unique partial index and @jmpesp tried it on 22.1 but got:Diesel support for
ON CONFLICTDiesel supports:
on_conflictwith a list of column nameson_conflictwithon_constraint(constraint_name)on_conflict_do_nothingSummary of options
UPSERTINSERT ... ON CONFLICT DO NOTHINGINSERT ... ON CONFLICT (columns) DO NOTHING*: you can always write raw SQL with DieselThis is a little disappointing: the intersection of syntaxes that you can build with Diesel (without raw SQL), that CockroachDB supports with partial indexes, and that does approximately what we want is just
ON CONFLICT DO NOTHING. This unfortunately overrides all conflicts, which is less precise than ideal. (For example, when we insert a new Silo into thesilotable, we may be okay ignoring a conflict onid(if we generated that id and we're retrying an operation that may have already succeeded), but we wouldn't want to ignore a conflict onnamewithout an error.)