Skip to content

sql: block_writer tries to violate uniqueness constraint when its gateway is killed and a load balancer is used #6053

@tamird

Description

@tamird

Please answer these questions for each node in your cluster:

  1. What version of CockroachDB are you using (cockroach version)?

    $ cockroach version
    Build Tag:   beta-20160407-175-g3710172
    Build Time:  2016/04/13 22:38:31
    Platform:    darwin amd64
    Go Version:  go1.6
    C Compiler:  4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.29)
    
  2. What operating system and processor architecture are you using?

    $ uname -a
    Darwin Tamirs-MacBook-Pro.local 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64
    
  3. What flags/environment variables did you pass to cockroach start?

    $ roachdemo
    2016/04/13 18:30:30 process 47868 started: ./cockroach start --insecure --port=26257 --http-port=26258 --store=cockroach-data/1
    2016/04/13 18:30:30 process 47869 started: ./cockroach start --insecure --port=26259 --http-port=26260 --store=cockroach-data/2 --join=localhost:26257
    2016/04/13 18:30:30 process 47870 started: ./cockroach start --insecure --port=26261 --http-port=26262 --store=cockroach-data/3 --join=localhost:26257
    

Please describe the issue you observed:

  1. What did you do?
    I was running nginx with the following config:

    stream {
    upstream cockroachRPC {
      server localhost:26257;
      server localhost:26259;
      server localhost:26261;
    }
    upstream cockroachHTTP {
      server localhost:26258;
      server localhost:26260;
      server localhost:26262;
    }
    
    server {
      listen 4040;
      proxy_pass cockroachRPC;
    }
    server {
      listen 8080;
      proxy_pass cockroachHTTP;
    }
    }
    

    Then I ran block_writer via nginx, and then I killed node 1 via roachdemo's UI:

    $ block_writer postgresql://root@localhost:4040?sslmode=disable
    1s:  869.2/sec
    2s:  602.0/sec
    3s:  551.0/sec
    4s:  482.2/sec
    5s:  279.1/sec
    6s:  379.3/sec
    7s:  308.0/sec
    8s:  448.3/sec
    9s:  414.3/sec
    10s:  327.9/sec
    11s:  363.3/sec
    2016/04/13 18:32:36 error running blockwriter 7da09e4e-d4b9-470c-9452-cf0c0bc41c63: pq: duplicate key value (block_id,writer_id,block_num)=(2014691597753416701,'7da09e4e-d4b9-470c-9452-cf0c0bc41c63',1658) violates unique constraint "primary"
    
  2. What did you expect to see?
    block_writer continues to run normally.

  3. What did you see instead?
    block_writer fails with a uniqueness constraint violation, suggesting that an insert was retried when it shouldn't have been. This is bad!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions