-
Notifications
You must be signed in to change notification settings - Fork 4.1k
roachtestutil/mixedversion: inconsistent node IDs #113384
Description
The test in #113222, using the mixedversion framework, has inconsistent node IDs. This is confusing to debug, and also breaks assertions that rely on the node attribute matching the node ID (typically used for zone configs).
In the cluster I'm currently looking at, nodes are running with these command line arguments:
erik 26259 13.4 0.6 2471300 454604 pts/0 Sl 09:06 3:28 /home/erik/local/3/v22.2.13/cockroach start --certs-dir /home/erik/local/3/certs --log file-defaults: {dir: '/home/erik/local/3/logs', exit-on-error: false} --listen-addr=:29000 --http-addr=:29001 --join=127.0.0.1:29004 --store path=/home/erik/local/3/data,attrs=store1:node3:node3store1 --cache=6% --locality=cloud=local,region=local,zone=local --max-sql-memory=6% --pid-file /home/erik/local/3/logs/cockroach.pid
erik 26260 11.7 0.6 1971700 415572 pts/0 Sl 09:06 3:02 /home/erik/local/1/v22.2.13/cockroach start --certs-dir /home/erik/local/1/certs --log file-defaults: {dir: '/home/erik/local/1/logs', exit-on-error: false} --listen-addr=:29004 --http-addr=:29005 --join=127.0.0.1:29004 --store path=/home/erik/local/1/data,attrs=store1:node1:node1store1 --cache=6% --locality=cloud=local,region=local,zone=local --max-sql-memory=6% --pid-file /home/erik/local/1/logs/cockroach.pid
erik 26276 11.3 0.6 2592180 448480 pts/0 Sl 09:06 2:55 /home/erik/local/4/v22.2.13/cockroach start --certs-dir /home/erik/local/4/certs --log file-defaults: {dir: '/home/erik/local/4/logs', exit-on-error: false} --listen-addr=:29006 --http-addr=:29007 --join=127.0.0.1:29004 --store path=/home/erik/local/4/data,attrs=store1:node4:node4store1 --cache=6% --locality=cloud=local,region=local,zone=local --max-sql-memory=6% --pid-file /home/erik/local/4/logs/cockroach.pid
erik 27056 10.3 0.7 1954836 488868 pts/0 Sl 09:07 2:38 /home/erik/local/2/v23.1.1/cockroach start --certs-dir /home/erik/local/2/certs --log file-defaults: {dir: '/home/erik/local/2/logs', exit-on-error: false} --listen-addr=:29002 --http-addr=:29003 --join=127.0.0.1:29004 --store path=/home/erik/local/2/data,attrs=store1:node2:node2store1 --cache=25% --locality=cloud=local,region=local,zone=local --max-sql-memory=25% --pid-file /home/erik/local/2/logs/cockroach.pid
From the DB console, I can map port numbers to node IDs:
- n1: port 29004
- n2: port 29000
- n3: port 29002
- n4: port 29006
Cross-referencing with the command-line args, we find two issues:
-
Node attributes and
~/local/<nodeid>directories do not correspond with the actual node IDs -- e.g. the real n3 is located in~/local/2and has node attributenode2. -
All nodes have attribute
store1, even though the actual store IDs correspond with the real node ID.
I think this happens because the framework explicitly sets Sequential: false, which means all nodes will join the cluster concurrently and race for node ID allocation. The store1 attribute may be a separate bug.
Jira issue: CRDB-32918