On my 2020 Macbook Pro running Catalina (10.15.7), using the official CockroachDB v20.2.5 build, CockroachDB schema updates are incredibly slow. I've started omicron_dev db-run in the background (which itself took tens of seconds to populate the database with an empty schema). Then, running either db-wipe or db-populate takes many seconds each (except wiping an empty database):
$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.12s
Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped
real 0m7.205s
user 0m0.199s
sys 0m0.067s
$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.12s
Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped
real 0m0.361s
user 0m0.203s
sys 0m0.067s
$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.12s
Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped
real 0m0.300s
user 0m0.201s
sys 0m0.066s
$ time cargo run --bin=omicron_dev db-populate --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.13s
Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database
real 0m19.214s
user 0m0.201s
sys 0m0.068s
$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.23s
Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped
real 0m6.726s
user 0m0.207s
sys 0m0.085s
$ time cargo run --bin=omicron_dev db-populate --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.13s
Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database
real 0m18.989s
user 0m0.196s
sys 0m0.068s
$
cargo test does a bunch of similar operations, so it winds up taking 2m40s altogether.
By contrast, on illumos (omnios-r151034-0d278a0cc5) running in AWS (t2.2xlarge, 8 cores, 32 GiB DRAM, one 200 GiB EBS volume), this is almost 20x faster:
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped
real 0m0.305s
user 0m0.118s
sys 0m0.080s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped
real 0m0.198s
user 0m0.116s
sys 0m0.077s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database
real 0m0.585s
user 0m0.117s
sys 0m0.078s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped
real 0m0.301s
user 0m0.117s
sys 0m0.079s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database
real 0m0.592s
user 0m0.119s
sys 0m0.080s
and the test suite takes only 8s.
Similarly, on illumos (helios-1.0.20336) running under VMware Fusion 11.5.7 on the same Mac, these operations are fast:
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.28s
Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped
real 0m0.739s
user 0m0.208s
sys 0m0.384s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.28s
Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database
real 0m1.011s
user 0m0.200s
sys 0m0.374s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.27s
Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped
real 0m0.663s
user 0m0.193s
sys 0m0.368s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.27s
Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database
real 0m1.020s
user 0m0.194s
sys 0m0.368s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
Finished dev [unoptimized + debuginfo] target(s) in 0.27s
Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped
real 0m0.672s
user 0m0.197s
sys 0m0.367s
and cargo test takes about 17s on this system.
Summarizing:
| Hardware |
Virtualization |
OS, FS |
CockroachDB |
Time to populate |
Test suite time |
| 2020 Macbook (8 HW threads, 32 GiB DRAM, local NVME SSD) |
None |
MacOS Catalina 10.15.7, APFS |
v20.2.5 (official build) |
18s |
2m40s |
| 2020 Macbook (same as above) |
VMWare Fusion 11.5.7 (VM has 4 HW threads, 8 GiB memory) |
illumos (helios-1.0.20336) |
v20.2.5 (illumos OSS build (unofficial)) |
1s |
17s |
| AWS EC2 t2.2xlarge (8 HW threads, 32 GiB DRAM, 200 GiB EBS (network) volume) |
AWS EC2 |
illumos (omnios-r151034-0d278a0cc5) |
v20.2.5 (illumos OSS build (unofficial)) |
<1s |
8s |
This is all on commit a0ce0e4.
This looks like a CockroachDB issue at this point. I can reproduce the slowness using this statement from the schema file:
CREATE UNIQUE INDEX ON omicron.public.Project (
name
) WHERE
time_deleted IS NOT NULL;
This takes several hundred milliseconds on an empty table.
Here are some avenues to explore:
- DTrace sync write latency (fsync and sync write syscalls) and I/O latency on the Mac
- Dig deeper into what CockroachDB is doing during this time. Maybe using statemaps?
- Dig into the details that CockroachDB can provide for slow queries. I did collect a stmt-bundle-641813260336332801.zip that includes a bunch of information, including a Jaeger-format tracing file. The comment at the top of this file shows how to get up and running with Jaeger quickly using Docker.
All that said: I'm not planning to dig into this right now. I just wanted to record this in case we decide to dig deeper in the future.
On my 2020 Macbook Pro running Catalina (10.15.7), using the official CockroachDB v20.2.5 build, CockroachDB schema updates are incredibly slow. I've started
omicron_dev db-runin the background (which itself took tens of seconds to populate the database with an empty schema). Then, running eitherdb-wipeordb-populatetakes many seconds each (except wiping an empty database):cargo testdoes a bunch of similar operations, so it winds up taking 2m40s altogether.By contrast, on illumos (omnios-r151034-0d278a0cc5) running in AWS (t2.2xlarge, 8 cores, 32 GiB DRAM, one 200 GiB EBS volume), this is almost 20x faster:
and the test suite takes only 8s.
Similarly, on illumos (helios-1.0.20336) running under VMware Fusion 11.5.7 on the same Mac, these operations are fast:
and
cargo testtakes about 17s on this system.Summarizing:
This is all on commit a0ce0e4.
This looks like a CockroachDB issue at this point. I can reproduce the slowness using this statement from the schema file:
This takes several hundred milliseconds on an empty table.
Here are some avenues to explore:
All that said: I'm not planning to dig into this right now. I just wanted to record this in case we decide to dig deeper in the future.