Skip to content

sql: UPSERT behavior change in optimizer #37880

@jordanlewis

Description

@jordanlewis

With the CBO on, UPSERT produces different results than the HP when there are tuples with duplicate values in a column with a constraint.

It seems that the CBO will reject such upserts, but the HP accepted them and preferred the later tuples. I'm not sure if this is intended, but it's causing errors in the kv/contention workload, which is showing up here: #36089. (The kv benchmark produces duplicate tuples in its upserts, which should be fixed regardless of how we resolve this issue.)

Note that Postgres outright bans this case with their UPSERT equivalent:

jordan=# insert into a values(1,2),(1,3) on conflict(a) do update set b=excluded.b;
ERROR:  ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

Expected, pre-CBO behavior:

root@127.0.0.1:59593/defaultdb> create table a (a int primary key, b int, index(b));
CREATE TABLE

Time: 2.238ms

root@127.0.0.1:59593/defaultdb> set optimizer=off;
SET

Time: 236µs

root@127.0.0.1:59662/defaultdb> \set auto_trace=on,kv
root@127.0.0.1:59662/defaultdb> upsert into a values(1,2),(1,3);
INSERT 2

Time: 1.411ms

             timestamp             |       age       |                                  message                                  |                        tag                         | location |      operation      | span
+----------------------------------+-----------------+---------------------------------------------------------------------------+----------------------------------------------------+----------+---------------------+------+
  2019-05-28 04:19:13.572524+00:00 | 00:00:00.000466 | Scan /Table/52/1/1{-/#}                                                   | [n1,client=127.0.0.1:59664,user=root]              |          | flow                |    6
  2019-05-28 04:19:13.57255+00:00  | 00:00:00.000492 | querying next range at /Table/52/1/1                                      | [n1,client=127.0.0.1:59664,user=root,txn=6052b0ab] |          | dist sender send    |    8
  2019-05-28 04:19:13.572573+00:00 | 00:00:00.000515 | r21: sending batch 1 Scan to (n1,s1):1                                    | [n1,client=127.0.0.1:59664,user=root,txn=6052b0ab] |          | dist sender send    |    8
  2019-05-28 04:19:13.572801+00:00 | 00:00:00.000743 | CPut /Table/52/1/1/0 -> /TUPLE/2:2:Int/2                                  | [n1,client=127.0.0.1:59664,user=root]              |          | flow                |    6
  2019-05-28 04:19:13.572809+00:00 | 00:00:00.000751 | InitPut /Table/52/2/2/1/0 -> /BYTES/                                      | [n1,client=127.0.0.1:59664,user=root]              |          | flow                |    6
  2019-05-28 04:19:13.572831+00:00 | 00:00:00.000773 | Put /Table/52/1/1/0 -> /TUPLE/2:2:Int/3                                   | [n1,client=127.0.0.1:59664,user=root]              |          | flow                |    6
  2019-05-28 04:19:13.572839+00:00 | 00:00:00.000781 | Del /Table/52/2/2/1/0                                                     | [n1,client=127.0.0.1:59664,user=root]              |          | flow                |    6
  2019-05-28 04:19:13.572843+00:00 | 00:00:00.000785 | CPut /Table/52/2/3/1/0 -> /BYTES/ (expecting does not exist)              | [n1,client=127.0.0.1:59664,user=root]              |          | flow                |    6
  2019-05-28 04:19:13.572875+00:00 | 00:00:00.000817 | querying next range at /Table/52/1/1/0                                    | [n1,client=127.0.0.1:59664,user=root,txn=6052b0ab] |          | dist sender send    |   11
  2019-05-28 04:19:13.572888+00:00 | 00:00:00.00083  | r21: sending batch 1 Put, 2 CPut, 1 Del, 1 EndTxn, 1 InitPut to (n1,s1):1 | [n1,client=127.0.0.1:59664,user=root,txn=6052b0ab] |          | dist sender send    |   11
  2019-05-28 04:19:13.573423+00:00 | 00:00:00.001365 | fast path completed                                                       | [n1,client=127.0.0.1:59664,user=root]              |          | flow                |    6
  2019-05-28 04:19:13.57345+00:00  | 00:00:00.001392 | rows affected: 2                                                          | [n1,client=127.0.0.1:59664,user=root]              |          | exec cmd: exec stmt |    4
(12 rows)

Time: 2.338ms

Actual CBO behavior:

root@127.0.0.1:59607/defaultdb> create table a (a int primary key, b int, index(b));
CREATE TABLE

Time: 2.624ms

root@127.0.0.1:59628/defaultdb> \set auto_trace=on,kv
root@127.0.0.1:59607/defaultdb> upsert into a values(1,2),(1,3);
pq: duplicate key value (a)=(1) violates unique constraint "primary"
             timestamp             |       age       |                                             message                                             |                        tag                         | location |      operation      | span
+----------------------------------+-----------------+-------------------------------------------------------------------------------------------------+----------------------------------------------------+----------+---------------------+------+
  2019-05-28 04:18:07.110666+00:00 | 00:00:00.000596 | Scan /Table/52/1/1{-/#}                                                                         | [n1,client=127.0.0.1:59630,user=root]              |          | join reader         |   16
  2019-05-28 04:18:07.11069+00:00  | 00:00:00.00062  | querying next range at /Table/52/1/1                                                            | [n1,client=127.0.0.1:59630,user=root,txn=7fc2e5fc] |          | dist sender send    |   18
  2019-05-28 04:18:07.110712+00:00 | 00:00:00.000642 | r21: sending batch 1 Scan to (n1,s1):1                                                          | [n1,client=127.0.0.1:59630,user=root,txn=7fc2e5fc] |          | dist sender send    |   18
  2019-05-28 04:18:07.110835+00:00 | 00:00:00.000765 | CPut /Table/52/1/1/0 -> /TUPLE/2:2:Int/2                                                        | [n1,client=127.0.0.1:59630,user=root]              |          | flow                |    6
  2019-05-28 04:18:07.110844+00:00 | 00:00:00.000774 | InitPut /Table/52/2/2/1/0 -> /BYTES/                                                            | [n1,client=127.0.0.1:59630,user=root]              |          | flow                |    6
  2019-05-28 04:18:07.110855+00:00 | 00:00:00.000785 | CPut /Table/52/1/1/0 -> /TUPLE/2:2:Int/3                                                        | [n1,client=127.0.0.1:59630,user=root]              |          | flow                |    6
  2019-05-28 04:18:07.110859+00:00 | 00:00:00.000789 | InitPut /Table/52/2/3/1/0 -> /BYTES/                                                            | [n1,client=127.0.0.1:59630,user=root]              |          | flow                |    6
  2019-05-28 04:18:07.110924+00:00 | 00:00:00.000854 | querying next range at /Table/52/1/1/0                                                          | [n1,client=127.0.0.1:59630,user=root,txn=7fc2e5fc] |          | dist sender send    |    8
  2019-05-28 04:18:07.110937+00:00 | 00:00:00.000867 | r21: sending batch 2 CPut, 1 EndTxn, 2 InitPut to (n1,s1):1                                     | [n1,client=127.0.0.1:59630,user=root,txn=7fc2e5fc] |          | dist sender send    |    8
  2019-05-28 04:18:07.111263+00:00 | 00:00:00.001193 | execution failed after 0 rows: duplicate key value (a)=(1) violates unique constraint "primary" | [n1,client=127.0.0.1:59630,user=root]              |          | exec cmd: exec stmt |    4
  2019-05-28 04:18:07.111288+00:00 | 00:00:00.001218 | querying next range at /Table/52/1/1/0                                                          | [n1,client=127.0.0.1:59630,user=root,txn=7fc2e5fc] |          | dist sender send    |   11
  2019-05-28 04:18:07.111297+00:00 | 00:00:00.001227 | r21: sending batch 1 EndTxn to (n1,s1):1                                                        | [n1,client=127.0.0.1:59630,user=root,txn=7fc2e5fc] |          | dist sender send    |   11
(12 rows)

Assigning @andy-kimball for initial remarks. Did you come across this case while porting mutations tot he CBO?

Metadata

Metadata

Labels

C-investigationFurther steps needed to qualify. C-label will change.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions