Skip to content

cli: TLS certs generated via connect cause one-way connectivity problem #61624

@knz

Description

@knz

Here is what I used:

  1. on machine 192.168.2.10 I ran the following command:

    ./cockroach connect --num-expected-initial-nodes 2 --init-token abc --listen-addr=192.168.2.10 --join=192.168.2.19:26258

  2. on machine 192.168.2.19 I ran the following:

    ./cockroach connect --num-expected-initial-nodes 2 --init-token abc --join=192.168.2.10:26257 --listen-addr=192.168.2.19:26258 --http-addr=:8081

This made the connect command complete successfully.

Note: beware of mentioning the port number in --join (because of issue #61620) and explicit IP addresses in --listen-addr (because of issues #61619 and #61616)

  1. Then as recommended by the connect command I ran the following, which worked:

    ./cockroach cert create-client root --ca-key=~/.cockroach-certs/ca-client.key

Then I started my CockroachDB nodes:

  1. on machine 192.168.2.10:

    ./cockroach start --join=192.168.2.19:26258 --listen-addr=192.168.2.10

  2. on machine 192.168.2.19:

    ./cockroach start --join=192.168.2.10:26257 --listen-addr=192.168.2.19:26258 --http-addr=:8081

  3. Here I start observing in logs something that is unexpected / undesirable (first symptom of the problem):

    • on 192.168.2.10 logs are fine:

      I210308 15:50:28.243256 206 server/init.go:420 ⋮ [n?] 28 ‹192.168.2.19:26258› is itself waiting for init, will retry

      This indicates that this server is able to establish an outgoing RPC conn to the other one.

    • on 192.168.2.19, we see the problem:

      W210308 16:05:46.896090 150 server/init.go:422 ⋮ [n?] 41 outgoing join rpc to ‹192.168.2.10:26257› unsuccessful: ‹rpc error: code = Unauthenticated desc = TLSInfo is not a vailable in request context›

      This indicates that this server is unable to establish its outgoing RPC conn to the other one.

  4. at this point I was suspecting that maybe the init RPC is special and uses a different TLS configuration that an already-initialized server. So I ran the following, which worked without errors:

    • on 192.168.2.10: ./cockroach start --join=192.168.2.19:26258 --listen-addr=192.168.2.10 --insecure
    • on 192.168.2.19: ./cockroach start --join=192.168.2.10:26257 --listen-addr=192.168.2.19:26258 --http-addr=:8081 --insecure
    • anywhere: ./cockroach init --host=... --port=... --insecure

    This initializes the cluster and assigns node ID without connectivity errors.

  5. Then I re-start the already-initialized servers, using the same commands as previously. Then ISee:

    • on 192.168.2.10 I see the following errors:
E210308 16:09:11.689777 446 kv/kvserver/consistency_queue.go:191 ⋮ [n1,consistencyChecker,s1,r22/1:‹/Table/2{6-7}›] 123  computing own checksum: could not dial node ID 1: u
nable to dial n1: ‹breaker open›
E210308 16:09:11.689856 446 kv/kvserver/queue.go:1093 ⋮ [n1,consistencyChecker,s1,r22/1:‹/Table/2{6-7}›] 124  computing own checksum: could not dial node ID 1: unable to di
al n1: ‹breaker open›
E210308 16:09:13.688958 1331 kv/kvserver/consistency_queue.go:191 ⋮ [n1,consistencyChecker,s1,r7/1:‹/Table/1{1-2}›] 125  computing own checksum: could not dial node ID 1: f
ailed to connect to n1 at ‹192.168.2.10:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unauthenticated desc = TLSInfo is not available in request context
›
  • on 192.168.2.19 I see the following errors:
W210308 16:08:49.464792 132 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 ⋮ [-] 55  ‹grpc: addrConn.createTransport failed to connect to {192.168.2.10:26257
  <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing cannot reuse client connection". Reconnecting...›
I210308 16:08:51.678239 257 1@vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 ⋮ [n2] 56  circuitbreaker: ‹rpc 192.168.2.19:26258 [n1]› tripped: failed to
 connect to n1 at ‹192.168.2.10:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unauthenticated desc = TLSInfo is not available in request context›
I210308 16:08:51.678323 257 1@vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 ⋮ [n2] 57  circuitbreaker: ‹rpc 192.168.2.19:26258 [n1]› event: ‹BreakerTri
pped›
I210308 16:08:51.678351 257 2@rpc/nodedialer/nodedialer.go:160 ⋮ [ct-client] 58  unable to connect to n1: failed to connect to n1 at ‹192.168.2.10:26257›: ‹initial connecti
on heartbeat failed›: ‹rpc error: code = Unauthenticated desc = TLSInfo is not available in request context›
I210308 16:08:51.766654 76 1@vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 ⋮ [n2] 59  circuitbreaker: ‹rpc 192.168.2.19:26258 [n1]› tripped: failed to
connect to n1 at ‹192.168.2.10:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unauthenticated desc = TLSInfo is not available in request context›
I210308 16:08:51.766737 76 1@vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 ⋮ [n2] 60  circuitbreaker: ‹rpc 192.168.2.19:26258 [n1]› event: ‹BreakerTrip
ped›
I210308 16:08:51.766759 76 2@rpc/nodedialer/nodedialer.go:160 ⋮ [n2,ts-poll,range-lookup=‹/Meta2/System/tsd/cr.node.build.timestamp/2/10s/2021-03-08T16:00:00Z›] 61  unable
to connect to n1: failed to connect to n1 at ‹192.168.2.10:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unauthenticated desc = TLSInfo is not available
 in request context›

Related to #60632

cc @aaron-crl @itsbilal

Metadata

Metadata

Assignees

Labels

A-cli-adminCLI commands that pertain to controlling and configuring nodesA-securityC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions