Here's where we saw it:
https://github.com/oxidecomputer/omicron/runs/6133927246?check_suite_focus=true#step:11:1956
error: failed to run custom build command for `nexus-test-utils v0.1.0 (/Users/runner/work/omicron/omicron/nexus/test-utils)`
Caused by:
process didn't exit successfully: `/Users/runner/work/omicron/omicron/target/debug/build/nexus-test-utils-08a85905696fbfd8/build-script-build` (exit status: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:rerun-if-changed=../../common/src/sql/dbinit.sql
cargo:rerun-if-changed=../../tools/cockroachdb_checksums
cargo:rerun-if-changed=../../tools/cockroachdb_version
--- stderr
Apr 22 19:42:29.902 INFO cockroach temporary directory: /tmp/omicron_tmp/.tmpCY01Eg
Apr 22 19:42:29.902 INFO cockroach command line: cockroach start-single-node --insecure --http-addr=:0 --store /Users/runner/work/omicron/omicron/target/debug/build/nexus-test-utils-4b01e832e8f96d40/out/crdb-base --listen-addr 127.0.0.1:0 --listening-url-file /tmp/omicron_tmp/.tmpCY01Eg/listen-url
Apr 22 19:42:42.034 INFO cockroach pid: 7411
Apr 22 19:42:42.034 INFO cockroach listen URL: postgresql://root@127.0.0.1:49430/omicron?sslmode=disable
Apr 22 19:42:42.034 INFO cockroach: populating
thread 'main' panicked at 'failed to populate database: populate
Caused by:
0: populating Omicron database
1: db error: ERROR: polling for queued jobs to complete: poll-show-jobs: remote wall time is too far ahead (1.732092s) to be trustworthy
2: ERROR: polling for queued jobs to complete: poll-show-jobs: remote wall time is too far ahead (1.732092s) to be trustworthy', /Users/runner/work/omicron/omicron/test-utils/src/dev/mod.rs:150:35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
WARN: dropped CockroachInstance without cleaning it up first (there may still be a child process running and a temporary directory leaked)
WARN: temporary directory leaked: /tmp/omicron_tmp/.tmpCY01Eg
warning: build failed, waiting for other jobs to finish...
error: build failed
Error: Process completed with exit code 101.
This message appears to come from CockroachDB, which has a poll-show-jobs thing and a component with this error message. What's weird is that this is a single-node CockroachDB cluster and there's only one system here. My first thought was maybe the system clock jumped during the test, but that seems unlikely. I wonder if this is a symptom of CPU starvation due to GitHub Actions workers being starved. That is: maybe CockroachDB gets a timestamp on the client, gets one from the server, and compares them, and hits this message when they're too far apart. On a single system, you could still fail that check if you were stuck off-CPU for a while between the calls to get timestamps.
Here's where we saw it:
https://github.com/oxidecomputer/omicron/runs/6133927246?check_suite_focus=true#step:11:1956
This message appears to come from CockroachDB, which has a
poll-show-jobsthing and a component with this error message. What's weird is that this is a single-node CockroachDB cluster and there's only one system here. My first thought was maybe the system clock jumped during the test, but that seems unlikely. I wonder if this is a symptom of CPU starvation due to GitHub Actions workers being starved. That is: maybe CockroachDB gets a timestamp on the client, gets one from the server, and compares them, and hits this message when they're too far apart. On a single system, you could still fail that check if you were stuck off-CPU for a while between the calls to get timestamps.