Skip to content

sql: simple bulk loading in Cockroach very slow (compared to PG) #5981

@tbg

Description

@tbg

It's slow to the point where you wonder how you're going to get data into it.
Did the following (on my MacBook).

go install github.com/tschottdorf/goplay/tblgen
rm -rf cockroach-data && cockroach start &
docker run -p 5432:5432 postgres

time (tblgen 10000 | cockroach sql)

real 0m16.712s
user 0m35.817s
sys 0m5.158s

time (tblgen 10000 | psql -h $(docker-machine ip default) -U postgres)

real 0m0.306s
user 0m0.023s
sys 0m0.024s

There ought to be some very low-hanging fruit there. tblgen has a constant that can tune how much we want to batch, but by default it inserts 640 entries per batch, which can't be extremely bad.

The typical trace (on the slower end) looks like this:

11:35:14.461839  .     3    ... node 1
11:35:14.461847  .     8    ... read has no clock uncertainty
11:35:14.462042  .   195    ... executing 2562 requests
11:35:14.462500  .   458    ... read-write path
11:35:14.462531  .    31    ... command queue
11:35:14.475034  . 12502    ... raft
11:35:14.480098  .  5064    ... applying batch

Some which need to renew the leader lease obviously take a little longer. I ran with a (horrible) hack that simply bypasses Raft, which didn't at all help:

real 0m17.260s
user 0m36.141s
sys 0m5.156s

But that's still slow. Might be time to push #5255 to completion for easier diagnosis here.

For testing sustained throughput, probably want

./cockroach zone set .default 'range_max_bytes: 99999999999'
./cockroach zone set .default 'replicas:
- attrs: []
'

for once a range splits, these inserts are not 1PC txns any more and things get really bad (on the other hand, background queues could get problematic with a very large replica).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions