Skip to content

rpc: use the loopback conn also for GRPCDialOptions#103764

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20230523-rpc-fix
May 23, 2023
Merged

rpc: use the loopback conn also for GRPCDialOptions#103764
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20230523-rpc-fix

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented May 23, 2023

Fixes #103762.
Fixes #99261.
Fixes #103692.
Epic: CRDB-28893

For context, rpc.GRPCDialOptions is used in two cases:

  • when connecting to other nodes as specified by the --join flag.
  • in the grpc-gateway code, to route incoming HTTP requests to the RPC subsystem.

The first one nearly always targets remotes nodes. The second one always targets the local node (it's a loopback connection).

Prior to this patch, the 2 callers to rpc.GRPCDialOptions would be served the regular "remote network conn" dial options unconditionally, including the backoff, only-once-dialer and other parameters suitable to connect to other nodes remotely.

While this choice is suitable for the --join logic, it's not suitable for the grpc-gateway loopback conn. In that case, we want to avoid all the network intelligence and especially avoid the only-once-dialer and circuit breaker.

This patch ensures that grpc-gateway receives the loopback parameters properly.

Release note (bug fix): A bug was fixed whereby under high CPU load,
HTTP requests to certain API endpoints (e.g. the health endpoint)
could start failing and then never succeed again until the node was
restarted. This bug had been introduced in v23.1.

@knz knz added the backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only label May 23, 2023
@knz knz requested a review from tbg May 23, 2023 09:03
@knz knz requested a review from a team as a code owner May 23, 2023 09:03
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented May 23, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@knz
Copy link
Copy Markdown
Contributor Author

knz commented May 23, 2023

NB: this fix doesn't work, there's a remaining bug. I'm looking into it.

@knz knz marked this pull request as draft May 23, 2023 09:06
@knz knz force-pushed the 20230523-rpc-fix branch from 8eacf34 to 5da5f8b Compare May 23, 2023 10:10
@knz knz marked this pull request as ready for review May 23, 2023 10:10
@knz knz force-pushed the 20230523-rpc-fix branch from 5da5f8b to 3571f52 Compare May 23, 2023 10:13
@knz
Copy link
Copy Markdown
Contributor Author

knz commented May 23, 2023

ok this is ready

For context, `rpc.GRPCDialOptions` is used in two cases:

- when connecting to other nodes as specified by the `--join` flag.
- in the grpc-gateway code, to route incoming HTTP requests to the RPC
  subsystem.

The first one nearly always targets remotes nodes. The second one
always targets the local node (it's a loopback connection).

Prior to this patch, the 2 callers to `rpc.GRPCDialOptions` would be
served the regular "remote network conn" dial options unconditionally,
including the backoff, only-once-dialer and other parameters suitable
to connect to other nodes remotely.

While this choice is suitable for the `--join` logic, it's not
suitable for the grpc-gateway loopback conn. In that case, we want to
avoid all the network intelligence and especially avoid the
only-once-dialer and circuit breaker.

This patch ensures that grpc-gateway receives the loopback parameters properly.

Release note (bug fix): A bug was fixed whereby under high CPU load,
HTTP requests to certain API endpoints (e.g. the health endpoint)
could start failing and then never succeed again until the node was
restarted. This bug had been introduced in v23.1.
@knz knz force-pushed the 20230523-rpc-fix branch from 3571f52 to 877111d Compare May 23, 2023 10:15
@knz
Copy link
Copy Markdown
Contributor Author

knz commented May 23, 2023

bors r=tbg

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 23, 2023

Build succeeded:

@craig craig bot merged commit 741c91b into cockroachdb:master May 23, 2023
@knz knz deleted the 20230523-rpc-fix branch May 23, 2023 13:27
knz added a commit to knz/cockroach that referenced this pull request Jun 27, 2023
We have fixed the issue that caused the skip in cockroachdb#103764.

Release note: None
craig bot pushed a commit that referenced this pull request Jun 27, 2023
105629: server: unskip TestStatusEngineStatsJson r=rafiss a=knz

Fixes #99261.

We have fixed the issue that caused the skip in #103764.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
blathers-crl bot pushed a commit that referenced this pull request Jun 27, 2023
We have fixed the issue that caused the skip in #103764.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only

Projects

None yet

3 participants