-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: UPSERT in SNAPSHOT isolation can write incorrect index records #23176
Description
BUG REPORT
% cockroach version cockroach
Build Tag: v1.1.5
Build Time: 2018/02/05 17:40:57
Distribution: CCL
Platform: linux amd64
Go Version: go1.8.3
C Compiler: gcc 6.3.0
Build SHA-1: ec52ded
Build Type: release-gnu
- Please describe the issue you observed:
We have a strange issue where an index is out of sync and is violating a unique constraint and won't let us easily, manually fix the issue.
root@cockroach:26257/ecs> SELECT * FROM task@task_by_location_synced WHERE ((location_id = 'xxx') AND (synchronized = false)) AND (initiation_time = '2018-02-27 15:19:24.281+00:00');
+-----------------------------------------------------+------------+----------------------+--------+------------------------------+-----------------------------------------------------------------------------------+--------------------+-------------------------------+---------+-----------------+-----------------------+--------------------------------------------------------------------------+------------------+----------+------------------------------+---------+--------------+------------------------------------------------------+-----------+----------------+----------+
| uuid | crm | location_id | active | end_time | entity_uuid | initiated_from_ecs | initiation_time | message | operation | operation_description | org_uuid | parent_task_uuid | progress | start_time | status | synchronized | task_id | task_type | user_full_name | username |
+-----------------------------------------------------+------------+----------------------+--------+------------------------------+-----------------------------------------------------------------------------------+--------------------+-------------------------------+---------+-----------------+-----------------------+--------------------------------------------------------------------------+------------------+----------+------------------------------+---------+--------------+------------------------------------------------------+-----------+----------------+----------+
| urn:yyy:task:2a11a3e9-a071-4624-94a7-5fcbd4fb0657 | 0000 | xxx | false | 2018-02-25 20:28:40.28+00:00 | xxx:urn:fake:entity:abcd | false | 2018-02-27 15:19:24.281+00:00 | NULL | fakeType | fakeType | xxx:urn:fake:parent:abcd | NULL | 100 | 2018-02-25 20:05:12.83+00:00 | success | false | urn:yyy:task:d2e4018b-ebfc-4f26-ac21-cfc713f359c1 | test | System | tester |
+-----------------------------------------------------+------------+----------------------+--------+------------------------------+-----------------------------------------------------------------------------------+--------------------+-------------------------------+---------+-----------------+-----------------------+--------------------------------------------------------------------------+------------------+----------+------------------------------+---------+--------------+------------------------------------------------------+-----------+----------------+----------+
(1 row)
Time: 128.560204ms
Everything looks fine in that above return. But if you change synchronized in the query to "true", we get:
root@cockroach:26257/ecs> SELECT * FROM task@task_by_location_synced WHERE ((location_id = 'xxx') AND (synchronized = true)) AND (initiation_time = '2018-02-27 15:19:24.281+00:00');
+-----------------------------------------------------+------------+----------------------+--------+------------------------------+-----------------------------------------------------------------------------------+--------------------+-------------------------------+---------+-----------------+-----------------------+--------------------------------------------------------------------------+------------------+----------+------------------------------+---------+--------------+------------------------------------------------------+-----------+----------------+----------+
| uuid | crm | location_id | active | end_time | entity_uuid | initiated_from_ecs | initiation_time | message | operation | operation_description | org_uuid | parent_task_uuid | progress | start_time | status | synchronized | task_id | task_type | user_full_name | username |
+-----------------------------------------------------+------------+----------------------+--------+------------------------------+-----------------------------------------------------------------------------------+--------------------+-------------------------------+---------+-----------------+-----------------------+--------------------------------------------------------------------------+------------------+----------+------------------------------+---------+--------------+------------------------------------------------------+-----------+----------------+----------+
| urn:yyy:task:2a11a3e9-a071-4624-94a7-5fcbd4fb0657 | 0000 | xxx | false | 2018-02-25 20:28:40.28+00:00 | xxx:urn:fake:entity:abc | false | 2018-02-27 15:19:24.281+00:00 | NULL | fakeType | fakeType | xxx:urn:fake:parent:abcd | NULL | 100 | 2018-02-25 20:05:12.83+00:00 | success | false | urn:yyy:task:d2e4018b-ebfc-4f26-ac21-cfc713f359c1 | test | System | tester |
+-----------------------------------------------------+------------+----------------------+--------+------------------------------+-----------------------------------------------------------------------------------+--------------------+-------------------------------+---------+-----------------+-----------------------+--------------------------------------------------------------------------+------------------+----------+------------------------------+---------+--------------+------------------------------------------------------+-----------+----------------+----------+
(1 row)
Time: 120.803796ms
As you can see even though we query for synchronized=true directly the resulting data shows that synchronized is still false.
If I query it in the following manner the data comes back like so with it being correct in terms of what I queried for:
root@cockroach:26257/ecs> SELECT synchronized FROM task@task_by_location_synced WHERE ((location_id = 'xxx') AND (synchronized = false)) AND (initiation_time = '2018-02-27 15:19:24.281+00:00');
+--------------+
| synchronized |
+--------------+
| false |
+--------------+
(1 row)
Time: 37.420711ms
root@cockroach:26257/ecs> SELECT synchronized FROM task@task_by_location_synced WHERE ((location_id = 'xxx') AND (synchronized = true)) AND (initiation_time = '2018-02-27 15:19:24.281+00:00');
+--------------+
| synchronized |
+--------------+
| true |
+--------------+
(1 row)
Time: 41.600935ms
If I query just the base table on the PRIMARY key I get the following:
root@cockroach:26257/ecs> SELECT * FROM task WHERE "uuid" = 'urn:xxx:task:2a11a3e9-a071-4624-94a7-5fcbd4fb0657';
+-----------------------------------------------------+------------+----------------------+--------+------------------------------+-----------------------------------------------------------------------------------+--------------------+-------------------------------+---------+-----------------+-----------------------+--------------------------------------------------------------------------+------------------+----------+------------------------------+---------+--------------+------------------------------------------------------+-----------+----------------+----------+
| uuid | crm | location_id | active | end_time | entity_uuid | initiated_from_ecs | initiation_time | message | operation | operation_description | org_uuid | parent_task_uuid | progress | start_time | status | synchronized | task_id | task_type | user_full_name | username |
+-----------------------------------------------------+------------+----------------------+--------+------------------------------+-----------------------------------------------------------------------------------+--------------------+-------------------------------+---------+-----------------+-----------------------+--------------------------------------------------------------------------+------------------+----------+------------------------------+---------+--------------+------------------------------------------------------+-----------+----------------+----------+
| urn:xxx:task:2a11a3e9-a071-4624-94a7-5fcbd4fb0657 | 0000 | xxx | false | 2018-02-25 20:28:40.28+00:00 | xxx:urn:fake:entity:abc | false | 2018-02-27 15:19:24.281+00:00 | NULL | fakeType | fakeType | xxx:urn:fake:parent:abcd | NULL | 100 | 2018-02-25 20:05:12.83+00:00 | success | false | urn:yyy:task:d2e4018b-ebfc-4f26-ac21-cfc713f359c1 | test | System | tester |
+-----------------------------------------------------+------------+----------------------+--------+------------------------------+-----------------------------------------------------------------------------------+--------------------+-------------------------------+---------+-----------------+-----------------------+--------------------------------------------------------------------------+------------------+----------+------------------------------+---------+--------------+------------------------------------------------------+-----------+----------------+----------+
(1 row)
Time: 134.96595ms
If I try to manually update the base row to show synchronized = false, I get the following error:
root@cockroach:26257/ecs> UPDATE task SET synchronized = true WHERE "uuid" = 'urn:xxx:task:2a11a3e9-a071-4624-94a7-5fcbd4fb0657';
pq: duplicate key value (location_id,synchronized,initiation_time)=('urn:xxx:task:2a11a3e9-a071-4624-94a7-5fcbd4fb0657',true,'2018-02-27 15:19:24.281+00:00') violates unique constraint "task_by_location_synced"
Thus, our row is stuck in a bad state where that field cannot be updated.
The schema is as follows:
+-----------------------+-----------+-------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Type | Null | Default | Indices |
+-----------------------+-----------+-------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| uuid | STRING | false | NULL | {"primary","task_by_native_id","task_by_location_synced","task_by_entity_synced","task_by_org_synced","task_by_org","task_by_org_type","task_by_entity","task_by_org_user","continuity_task_by_org"} |
| crm | STRING | true | NULL | {} |
| location_id | STRING | true | NULL | {"task_by_native_id","task_by_location_synced"} |
| active | BOOL | true | NULL | {} |
| end_time | TIMESTAMP | true | NULL | {} |
| entity_uuid | STRING | true | NULL | {"task_by_entity_synced","task_by_entity"} |
| initiated_from_ecs | BOOL | true | NULL | {} |
| initiation_time | TIMESTAMP | true | NULL | {"task_by_native_id","task_by_location_synced","task_by_entity_synced","task_by_org_synced","task_by_org","task_by_org_type","task_by_entity","task_by_org_user","continuity_task_by_org"} |
| message | STRING | true | NULL | {} |
| operation | STRING | true | NULL | {"continuity_task_by_org"} |
| operation_description | STRING | true | NULL | {} |
| org_uuid | STRING | true | NULL | {"task_by_org_synced","task_by_org","task_by_org_type","task_by_org_user","continuity_task_by_org"} |
| parent_task_uuid | STRING | true | NULL | {"continuity_task_by_org"} |
| progress | INT | true | NULL | {} |
| start_time | TIMESTAMP | true | NULL | {} |
| status | STRING | true | NULL | {} |
| synchronized | BOOL | true | NULL | {"task_by_location_synced","task_by_entity_synced","task_by_org_synced","continuity_task_by_org"} |
| task_id | STRING | true | NULL | {"task_by_native_id"} |
| task_type | STRING | true | NULL | {"task_by_org_type"} |
| user_full_name | STRING | true | NULL | {} |
| username | STRING | true | NULL | {"task_by_org_user"} |
+-----------------------+-----------+-------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
root@cockroach:26257/ecs> show indexes from task;
+-------+-------------------------+--------+-----+------------------+-----------+---------+----------+
| Table | Name | Unique | Seq | Column | Direction | Storing | Implicit |
+-------+-------------------------+--------+-----+------------------+-----------+---------+----------+
| task | primary | true | 1 | uuid | ASC | false | false |
| task | task_by_native_id | false | 1 | location_id | ASC | false | false |
| task | task_by_native_id | false | 2 | task_id | ASC | false | false |
| task | task_by_native_id | false | 3 | initiation_time | DESC | false | false |
| task | task_by_native_id | false | 4 | uuid | ASC | false | true |
| task | task_by_location_synced | false | 1 | location_id | ASC | false | false |
| task | task_by_location_synced | false | 2 | synchronized | ASC | false | false |
| task | task_by_location_synced | false | 3 | initiation_time | DESC | false | false |
| task | task_by_location_synced | false | 4 | uuid | ASC | false | true |
| task | task_by_entity_synced | false | 1 | entity_uuid | ASC | false | false |
| task | task_by_entity_synced | false | 2 | synchronized | ASC | false | false |
| task | task_by_entity_synced | false | 3 | initiation_time | DESC | false | false |
| task | task_by_entity_synced | false | 4 | uuid | ASC | false | true |
| task | task_by_org_synced | false | 1 | org_uuid | ASC | false | false |
| task | task_by_org_synced | false | 2 | synchronized | ASC | false | false |
| task | task_by_org_synced | false | 3 | initiation_time | DESC | false | false |
| task | task_by_org_synced | false | 4 | uuid | ASC | false | true |
| task | task_by_org | false | 1 | org_uuid | ASC | false | false |
| task | task_by_org | false | 2 | initiation_time | DESC | false | false |
| task | task_by_org | false | 3 | uuid | ASC | false | true |
| task | task_by_org_type | false | 1 | org_uuid | ASC | false | false |
| task | task_by_org_type | false | 2 | task_type | ASC | false | false |
| task | task_by_org_type | false | 3 | initiation_time | DESC | false | false |
| task | task_by_org_type | false | 4 | uuid | ASC | false | true |
| task | task_by_entity | false | 1 | entity_uuid | ASC | false | false |
| task | task_by_entity | false | 2 | initiation_time | DESC | false | false |
| task | task_by_entity | false | 3 | uuid | ASC | false | true |
| task | task_by_org_user | false | 1 | org_uuid | ASC | false | false |
| task | task_by_org_user | false | 2 | username | ASC | false | false |
| task | task_by_org_user | false | 3 | initiation_time | DESC | false | false |
| task | task_by_org_user | false | 4 | uuid | ASC | false | true |
| task | continuity_task_by_org | false | 1 | org_uuid | ASC | false | false |
| task | continuity_task_by_org | false | 2 | parent_task_uuid | ASC | false | false |
| task | continuity_task_by_org | false | 3 | synchronized | ASC | false | false |
| task | continuity_task_by_org | false | 4 | operation | ASC | false | false |
| task | continuity_task_by_org | false | 5 | initiation_time | DESC | false | false |
| task | continuity_task_by_org | false | 6 | uuid | ASC | false | true |
+-------+-------------------------+--------+-----+------------------+-----------+---------+----------+
(37 rows)
Let me know if there is anything else I can provide to help get to the bottom of this issue. I am willing to screen share if necessary, etc.
Thanks!