-
Notifications
You must be signed in to change notification settings - Fork 4.1k
cli,server: disable latency jump detection with start-single-node and/or when using Docker on macOS #98066
Description
Describe the problem
When running inside a Docker container on macOS, the TCP stack has very irregular latencies. This causes spurious log messages for folk who are exploring / testing using cockroach start-single-node.
Like this:
2023-03-06 10:56:36 W230306 15:56:34.770789 623 2@rpc/clock_offset.go:226 ⋮ [n1,rnode=1,raddr=‹b17287e17171:26257›,class=default,heartbeat] 40 latency jump (prev avg 64.54ms, current 108.40ms)
2023-03-06 10:56:45 W230306 15:56:44.095660 623 2@rpc/clock_offset.go:226 ⋮ [n1,rnode=1,raddr=‹b17287e17171:26257›,class=default,heartbeat] 41 latency jump (prev avg 70.86ms, current 131.61ms)
2023-03-06 10:57:01 W230306 15:56:59.634686 623 2@rpc/clock_offset.go:226 ⋮ [n1,rnode=1,raddr=‹b17287e17171:26257›,class=default,heartbeat] 42 latency jump (prev avg 72.80ms, current 242.88ms)
To Reproduce
run cockroach start-single-node on macOS via the Docker image.
Expected behavior
We should avoid the spurious warnings in that case.
Note that this logging mechanism is otherwise desirable in production clusters. A latency jump from, say, 70ms to 242ms between nodes, or on a node connecting to itself via the loopback interface, is a MAJOR operational event and should be reported.
The anomalous situation here is caused by macOS. So if we make the situation better in that case, that should not be at the expense of proper network observability for everyone else.
Jira issue: CRDB-25060