Skip to content

slow test suite and schema operations on Mac #53

@davepacheco

Description

@davepacheco

On my 2020 Macbook Pro running Catalina (10.15.7), using the official CockroachDB v20.2.5 build, CockroachDB schema updates are incredibly slow. I've started omicron_dev db-run in the background (which itself took tens of seconds to populate the database with an empty schema). Then, running either db-wipe or db-populate takes many seconds each (except wiping an empty database):

$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real	0m7.205s
user	0m0.199s
sys	0m0.067s
$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real	0m0.361s
user	0m0.203s
sys	0m0.067s
$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real	0m0.300s
user	0m0.201s
sys	0m0.066s
$ time cargo run --bin=omicron_dev db-populate --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real	0m19.214s
user	0m0.201s
sys	0m0.068s
$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.23s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real	0m6.726s
user	0m0.207s
sys	0m0.085s
$ time cargo run --bin=omicron_dev db-populate --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real	0m18.989s
user	0m0.196s
sys	0m0.068s
$

cargo test does a bunch of similar operations, so it winds up taking 2m40s altogether.

By contrast, on illumos (omnios-r151034-0d278a0cc5) running in AWS (t2.2xlarge, 8 cores, 32 GiB DRAM, one 200 GiB EBS volume), this is almost 20x faster:

$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.305s
user    0m0.118s
sys     0m0.080s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.198s
user    0m0.116s
sys     0m0.077s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real    0m0.585s
user    0m0.117s
sys     0m0.078s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.301s
user    0m0.117s
sys     0m0.079s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real    0m0.592s
user    0m0.119s
sys     0m0.080s

and the test suite takes only 8s.

Similarly, on illumos (helios-1.0.20336) running under VMware Fusion 11.5.7 on the same Mac, these operations are fast:

$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real	0m0.739s
user	0m0.208s
sys	0m0.384s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real	0m1.011s
user	0m0.200s
sys	0m0.374s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real	0m0.663s
user	0m0.193s
sys	0m0.368s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real	0m1.020s
user	0m0.194s
sys	0m0.368s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real	0m0.672s
user	0m0.197s
sys	0m0.367s

and cargo test takes about 17s on this system.

Summarizing:

Hardware Virtualization OS, FS CockroachDB Time to populate Test suite time
2020 Macbook (8 HW threads, 32 GiB DRAM, local NVME SSD) None MacOS Catalina 10.15.7, APFS v20.2.5 (official build) 18s 2m40s
2020 Macbook (same as above) VMWare Fusion 11.5.7 (VM has 4 HW threads, 8 GiB memory) illumos (helios-1.0.20336) v20.2.5 (illumos OSS build (unofficial)) 1s 17s
AWS EC2 t2.2xlarge (8 HW threads, 32 GiB DRAM, 200 GiB EBS (network) volume) AWS EC2 illumos (omnios-r151034-0d278a0cc5) v20.2.5 (illumos OSS build (unofficial)) <1s 8s

This is all on commit a0ce0e4.

This looks like a CockroachDB issue at this point. I can reproduce the slowness using this statement from the schema file:

CREATE UNIQUE INDEX ON omicron.public.Project (
    name
) WHERE
    time_deleted IS NOT NULL;

This takes several hundred milliseconds on an empty table.

Here are some avenues to explore:

  • DTrace sync write latency (fsync and sync write syscalls) and I/O latency on the Mac
  • Dig deeper into what CockroachDB is doing during this time. Maybe using statemaps?
  • Dig into the details that CockroachDB can provide for slow queries. I did collect a stmt-bundle-641813260336332801.zip that includes a bunch of information, including a Jaeger-format tracing file. The comment at the top of this file shows how to get up and running with Jaeger quickly using Docker.

All that said: I'm not planning to dig into this right now. I just wanted to record this in case we decide to dig deeper in the future.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions