-
Notifications
You must be signed in to change notification settings - Fork 4.1k
cli: TLS certs generated via connect cause one-way connectivity problem #61624
Description
Here is what I used:
-
on machine
192.168.2.10I ran the following command:./cockroach connect --num-expected-initial-nodes 2 --init-token abc --listen-addr=192.168.2.10 --join=192.168.2.19:26258 -
on machine
192.168.2.19I ran the following:./cockroach connect --num-expected-initial-nodes 2 --init-token abc --join=192.168.2.10:26257 --listen-addr=192.168.2.19:26258 --http-addr=:8081
This made the connect command complete successfully.
Note: beware of mentioning the port number in --join (because of issue #61620) and explicit IP addresses in --listen-addr (because of issues #61619 and #61616)
-
Then as recommended by the
connectcommand I ran the following, which worked:./cockroach cert create-client root --ca-key=~/.cockroach-certs/ca-client.key
Then I started my CockroachDB nodes:
-
on machine
192.168.2.10:./cockroach start --join=192.168.2.19:26258 --listen-addr=192.168.2.10 -
on machine
192.168.2.19:./cockroach start --join=192.168.2.10:26257 --listen-addr=192.168.2.19:26258 --http-addr=:8081 -
Here I start observing in logs something that is unexpected / undesirable (first symptom of the problem):
-
on 192.168.2.10 logs are fine:
I210308 15:50:28.243256 206 server/init.go:420 ⋮ [n?] 28 ‹192.168.2.19:26258› is itself waiting for init, will retryThis indicates that this server is able to establish an outgoing RPC conn to the other one.
-
on 192.168.2.19, we see the problem:
W210308 16:05:46.896090 150 server/init.go:422 ⋮ [n?] 41 outgoing join rpc to ‹192.168.2.10:26257› unsuccessful: ‹rpc error: code = Unauthenticated desc = TLSInfo is not a vailable in request context›This indicates that this server is unable to establish its outgoing RPC conn to the other one.
-
-
at this point I was suspecting that maybe the
initRPC is special and uses a different TLS configuration that an already-initialized server. So I ran the following, which worked without errors:- on 192.168.2.10:
./cockroach start --join=192.168.2.19:26258 --listen-addr=192.168.2.10 --insecure - on 192.168.2.19:
./cockroach start --join=192.168.2.10:26257 --listen-addr=192.168.2.19:26258 --http-addr=:8081 --insecure - anywhere:
./cockroach init --host=... --port=... --insecure
This initializes the cluster and assigns node ID without connectivity errors.
- on 192.168.2.10:
-
Then I re-start the already-initialized servers, using the same commands as previously. Then ISee:
- on 192.168.2.10 I see the following errors:
E210308 16:09:11.689777 446 kv/kvserver/consistency_queue.go:191 ⋮ [n1,consistencyChecker,s1,r22/1:‹/Table/2{6-7}›] 123 computing own checksum: could not dial node ID 1: u
nable to dial n1: ‹breaker open›
E210308 16:09:11.689856 446 kv/kvserver/queue.go:1093 ⋮ [n1,consistencyChecker,s1,r22/1:‹/Table/2{6-7}›] 124 computing own checksum: could not dial node ID 1: unable to di
al n1: ‹breaker open›
E210308 16:09:13.688958 1331 kv/kvserver/consistency_queue.go:191 ⋮ [n1,consistencyChecker,s1,r7/1:‹/Table/1{1-2}›] 125 computing own checksum: could not dial node ID 1: f
ailed to connect to n1 at ‹192.168.2.10:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unauthenticated desc = TLSInfo is not available in request context
›
- on 192.168.2.19 I see the following errors:
W210308 16:08:49.464792 132 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 ⋮ [-] 55 ‹grpc: addrConn.createTransport failed to connect to {192.168.2.10:26257
<nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing cannot reuse client connection". Reconnecting...›
I210308 16:08:51.678239 257 1@vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 ⋮ [n2] 56 circuitbreaker: ‹rpc 192.168.2.19:26258 [n1]› tripped: failed to
connect to n1 at ‹192.168.2.10:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unauthenticated desc = TLSInfo is not available in request context›
I210308 16:08:51.678323 257 1@vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 ⋮ [n2] 57 circuitbreaker: ‹rpc 192.168.2.19:26258 [n1]› event: ‹BreakerTri
pped›
I210308 16:08:51.678351 257 2@rpc/nodedialer/nodedialer.go:160 ⋮ [ct-client] 58 unable to connect to n1: failed to connect to n1 at ‹192.168.2.10:26257›: ‹initial connecti
on heartbeat failed›: ‹rpc error: code = Unauthenticated desc = TLSInfo is not available in request context›
I210308 16:08:51.766654 76 1@vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 ⋮ [n2] 59 circuitbreaker: ‹rpc 192.168.2.19:26258 [n1]› tripped: failed to
connect to n1 at ‹192.168.2.10:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unauthenticated desc = TLSInfo is not available in request context›
I210308 16:08:51.766737 76 1@vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 ⋮ [n2] 60 circuitbreaker: ‹rpc 192.168.2.19:26258 [n1]› event: ‹BreakerTrip
ped›
I210308 16:08:51.766759 76 2@rpc/nodedialer/nodedialer.go:160 ⋮ [n2,ts-poll,range-lookup=‹/Meta2/System/tsd/cr.node.build.timestamp/2/10s/2021-03-08T16:00:00Z›] 61 unable
to connect to n1: failed to connect to n1 at ‹192.168.2.10:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unauthenticated desc = TLSInfo is not available
in request context›
Related to #60632