Close and retry a RemoteSigner on err#2876
Close and retry a RemoteSigner on err#2876zmanian wants to merge 2 commits intotendermint:developfrom
Conversation
|
|
||
| ### IMPROVEMENTS: | ||
|
|
||
| Retry RemoteSigner connections on Error |
There was a problem hiding this comment.
| Retry RemoteSigner connections on Error | |
| - [privval] retry RemoteSigner connections on Error |
| "err", err, | ||
| ) | ||
| sc.conn.Close() | ||
| sc.RemoteSignerClient = NewRemoteSignerClient(sc.conn) |
There was a problem hiding this comment.
If we're closing the sc.conn, why do we create a new remote signer client with it? Shouldn't we call OnStop(), then OnStart() to wait for conn and recreate all necessary structures and goroutines?
Shouldn't we assert the err's type? We may want to shut everything down in case of ErrUnexpectedResponse, no?
There was a problem hiding this comment.
If we're closing the sc.conn, why do we create a new remote signer client with it? Shouldn't we call OnStop(), then OnStart() to wait for conn and recreate all necessary structures and goroutines?
I think that makes sense! Do we need to limit the recursion depth here though (as we are already in OnStart)?
Shouldn't we assert the err's type? We may want to shut everything down in case of ErrUnexpectedResponse, no?
Probably yes. ErrUnexpectedResponse means the other side responded but with another message than expected (not PingResponse).
There was a problem hiding this comment.
BTW calling OnStart won't be enough as this might time out (again) if the remote signer isn't up yet. We might need sth like retrying with exponential backoff.
Also, the whole ConsensusState shuts down if the remote signer drops (restarting the remote signer won't help with this): #2926
|
closing in favour of: #2923 |
…rmint#2846) (tendermint#2876) --- Many RPC methods require JSON marshalled responses. We saw this taking a notable amount of heap allocation in query serving full nodes. This PR removes some extra heap allocations that were being done. We avoided using the more efficient encoder.Encode before, because it added a newline. This PR changes the function signature for these private methods to be using *bytes.Buffer, and then uses the in-buffer methods (rather than a second copy). We then just truncate the final byte after each such call, which does not waste any allocations. I added a benchmark for the most complex test case. OLD: ``` BenchmarkJsonMarshalStruct-12 78992 15542 ns/op 4487 B/op 191 allocs/op ``` New: ``` BenchmarkJsonMarshalStruct-12 93346 11132 ns/op 3245 B/op 58 allocs/op ``` Roughly a 3-4x reduction in the number of allocations, and 20% speedup. #### PR checklist - [x] Tests written/updated - Existing tests cover this - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request tendermint#2846 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Andy Nogueira <me@andynogueira.dev>
This allows recovery if the RemoteSigner service is restarted etc without restarting the process.