release-2.0: cli: fix cockroach quit#26163
release-2.0: cli: fix cockroach quit#26163craig[bot] merged 1 commit intocockroachdb:release-2.0from
cockroach quit#26163Conversation
This patch fixes the following: - the logic in `doShutdown()` aims to ignore errors caused by attempts connect to a server which is closing its gRPC channels, but was missing one case of such errors: during the initial check whether the node was running. This patch causes gRPC "closed connection" errors to become also ignored in that case. - previously if there was a transient gRPC error during a hard shutdown whereby the shutdown could still succeed, then `cockroach quit` would fail no matter what. This patch makes `cockroach quit` retry a hard shutdown. - the warning messages are now emitted on stderr (via `log.Warningf`) instead of stdout. Release note (bug fix): fix a bug where `cockroach quit` would erroneously fail even though the node already successfully shut down. Release note (cli change): `cockroach quit` now emits warning message on its standard error stream, not standard output.
|
LGTM. I assume you were able to verify (otherwise I would suggest leaving it to bake on master for a few days before merging this) |
|
Let's let it bake on master before merging then. |
bdarnell
left a comment
There was a problem hiding this comment.
Is the bug present in release-2.0? I thought our theory was that it was related to the GRPC upgrade on master.
| // run a noop DrainRequest. If that fails, we give up. | ||
| if err := checkNodeRunning(ctx, c); err != nil { | ||
| if grpcutil.IsClosedConnection(err) { | ||
| return nil |
There was a problem hiding this comment.
I think if you try to run cockroach quit against a node that's not running, we still want to fail instead of swallowing the error.
There was a problem hiding this comment.
The thing still fails. IsClosedConnection does not return true on ECONNREFUSED.
This PR fixes an orthogonal bug: |
|
We ignore "RPC connection is closed" errors on the other paths because the Drain RPC causes the connection to be closed. It's not an expected error from checkNodeRunning. |
it is expected when |
|
Ah, I missed that this whole routine was called twice. LGTM |
|
This hasn't hurt us on master, so merging this. bors r+ |
Build succeeded |
Backport 1/1 commits from #26158.
This
cockroach quitbug is likely to affect production users, so we need to back-port it./cc @cockroachdb/release