Skip to content

sql: unexpected duplicate key violations #7604

@tamird

Description

@tamird

(originally reported by @aquarat in #6053; I've deleted the original comment because it doesn't seem related to the original issue).

What version of CockroachDB are you using (cockroach version)?

$ cockroach version
Build Tag:   beta-20160519
Build Time:  2016/05/19 21:52:10
Platform:    linux amd64
Go Version:  go1.6
C Compiler:  gcc 4.9.2
(both nodes - using the same executable)

What operating system and processor architecture are you using?

Both nodes : Ubuntu 15.10 "Wily"

Node 1 : (local) (received most if not all writes)
Linux sqlrat 3.19.0-43-generic #49-Ubuntu SMP Sun Dec 27 19:43:07 UTC 2015 x86_64 x86_64 
x86_64 GNU/Linux 

Node 2 : (remote) (saw few/no writes)
Linux parirat 4.2.0-25-generic #30-Ubuntu SMP Mon Jan 18 12:31:50 UTC 2016 x86_64 x86_64 
x86_64 GNU/Linux

What flags/environment variables did you pass to cockroach start?

Node 1
$ ./cockroach start --insecure --port=25267 --join=127.0.0.1:25268 --alsologtostderr
Node 2
$ ./cockroach start --join=127.0.0.1:25267 --port=25268 --insecure --alsologtostderr --http-port=65530

Nodes are linked via SSH tunnelling (they're on different machines; one in France, the other in South Africa). Both machines are NTP synchronised.

ssh -NCL 25268:127.0.0.1:25268 -R 25267:127.0.0.1:25267 user@host -p 60022 -i mykey

Please describe the issue you observed:

2016/05/23 19:57:57 pq: context deadline exceeded
2016/05/23 20:00:24 pq: duplicate key value (rowid)=(144035022173011969) violates unique constraint "primary"
2016/05/23 20:00:24 pq: duplicate key value (rowid)=(144035021942947841) violates unique constraint "primary"

Initially the message "context deadline exceeded" was repeated many hundreds of times, a failure rate 
of around 1% across ~ 1 million records, towards the end of the "copy" process the second type of 
message was repeated hundreds of times.

What did you do?

I created and ran a Go application with the intention of using it to duplicate a table from
a Postgres database to a local CockroachDB node by selecting from the Postgres instance 
and inserting received records into the CockroachDB node. Initially this application was 
single-threaded but CockroachDB responded slowly to inserts, so I modified the 
application to execute inserts concurrently using a set of goroutines. I increased the number
of goroutine workers until I started to see more than 20% CPU usage on the CockroachDB
instance. My code is linked below... it was a very quick and dirty creation, apologies in advance. 
CockroachDB indicated 107 connections during the "copy" process. During this time no
write transactions were executed on the second node, but the occasional 
SELECT was executed, usually SELECT COUNT and SELECT * ... LIMIT 1;

Application performing the copy : https://play.golang.org/p/hhGzhNtmHL

The table structure in CockroachDB :
CREATE TABLE log (id INT, data STRING, entered TIMESTAMP, channel STRING);
(implying the creation of the rowid column)

What did you expect to see?

Limited or no error messages; I was copying an existing table and the table structure
in CockroachDB I was inserting into had a managed primary key (hidden rowid), so I
wasn't expecting any key conflicts.

What did you see instead?

Initially the odd "context deadline exceeded" and then towards the end of the process
(after about 600k rows were inserted)  errors indicating a duplicate key violation on the
rowid column. Both messages were repeated many times (in excess of 1000).

Metadata

Metadata

Assignees

No one assigned

    Labels

    S-1-stabilitySevere stability issues that can be fixed by upgrading, but usually don’t resolve by restarting

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions