Conversation
* add comments to all public types * fix comments to adhere to comment standards
We can modify the protocol here to support error handling. Want to propose something? |
|
Also, looks like Tendermint is responsible for dialing the signer process here. I thought we wanted the signer to dial Tendermint (#1060 (comment)). Should we make the SocketServer implement PrivKey and give Tendermint that or is there some better reason this should stay as is? |
Codecov Report
@@ Coverage Diff @@
## feature/priv_val #1204 +/- ##
===================================================
Coverage ? 59.47%
===================================================
Files ? 126
Lines ? 11517
Branches ? 0
===================================================
Hits ? 6850
Misses ? 4019
Partials ? 648 |
cmd/priv_val_server/main.go
Outdated
| func main() { | ||
| var ( | ||
| chainID = flag.String("chain-id", "mychain", "chain id") | ||
| numClients = flag.Int("clients", 1, "number of concurrently connected clients") |
There was a problem hiding this comment.
how am I supposed to know the number of clients beforehand?
There was a problem hiding this comment.
Or this is a maximum number of concurrent clients?
There was a problem hiding this comment.
Yes, will make the description more clear.
cmd/priv_val_server/main.go
Outdated
| ) | ||
| flag.Parse() | ||
|
|
||
| logger.Info("Reading args privValidatorSocketServer", "chainID", *chainID, "privPath", *privValPath) |
There was a problem hiding this comment.
Maybe change to fmt.Sprintf("Starting private validator socket server on %s", listenAddr), "chainID", *chainID, "privValPath", *privValPath, ?
cmd/priv_val_server/main.go
Outdated
| chainID = flag.String("chain-id", "mychain", "chain id") | ||
| numClients = flag.Int("clients", 1, "number of concurrently connected clients") | ||
| privValPath = flag.String("priv", "", "priv val file path") | ||
| socketAddr = flag.String("socket.addr", ":46659", "socket bind addr") |
There was a problem hiding this comment.
Let's keep a single standard (listenAddr = flag.String("laddr")
cmd.Flags().String("p2p.laddr", config.P2P.ListenAddress, "Node listen address. (0.0.0.0:0 means any interface, any port)")
cmd/tendermint/commands/run_node.go
Outdated
| cmd.Flags().String("moniker", config.Moniker, "Node Name") | ||
|
|
||
| // priv val flags | ||
| cmd.Flags().String("priv_validator_addr", config.PrivValidatorAddr, "Socket address for PrivValidator") |
There was a problem hiding this comment.
private validator, not PrivValidator
types/priv_validator/socket.go
Outdated
|
|
||
| var err error | ||
| var conn net.Conn | ||
| var ( |
There was a problem hiding this comment.
can we extract this into a separate connect function?
types/priv_validator/socket.go
Outdated
| } | ||
|
|
||
| // PubKey implements PrivValidator. | ||
| func (pvsc *PrivValidatorSocketClient) PubKey() crypto.PubKey { |
There was a problem hiding this comment.
Should not PubKey return error too by analogy with SignProposal?
| } | ||
| pvss.Logger.Error("Failed to accept connection: " + err.Error()) | ||
| pvss.Logger.Error( | ||
| "accpetConnections", |
There was a problem hiding this comment.
"Failed to accept new connection"
There was a problem hiding this comment.
While I see the point given the signature of Error(msg string, keyvals ...interface{}) it seems very hard to build up the context the error was coming from as it would result in:
E[02-13|17:42:57.620] Failed to accept new connection err=foo
Maybe I'm not seeing an important aspect of error logging in our codeabse, but that line has no signal in it. Happy to incorporate your comments, just wondering if it is the best way to go abouut it.
There was a problem hiding this comment.
You can turn on a tracing logger by providing --trace flag https://github.com/tendermint/tendermint/blob/master/cmd/tendermint/commands/root.go#L53
In this mode logger will output full trace if error supports this feature (pkg/errors does). https://github.com/tendermint/tmlibs/blob/master/log/tracing_logger.go#L58
There was a problem hiding this comment.
Thanks for pointing that out! This still doesn't really help after the fact, which makes errors emitted without tracing still not very useful.
There was a problem hiding this comment.
Ok. I see your point. So the agreement always was: meaningful-message key=value. For example: "Failed to accept connection err=EOF". We're still in a process of improving the messages. Plus we started to use pkg/errors not so long ago to accumulate context.
Usually, what we do is just grep by msg ("Failed to accept connection"). Of course, if you have many places in your app where you accept a connection, then msg must be changed to something like "Failed to accept connection from X".
I am happy to hear any suggestions on how to improve the current scheme.
There was a problem hiding this comment.
Very sound approach and I don't mean to start a greater discussion if we already iterating on error handling in other places.
I am happy to hear any suggestions on how to improve the current scheme.
Usually thinking about in what remote ends logs can end up in typical infrastructure I try to imagine what vital info is needed to make sense of it when consumed in some log infrastructure remote from the origin. Additionally how each log line can interpreted by itself without context of the lines before or after. As the logger is passed from the outside to the inside in the program we can enrich the keyvals with information as well which subsystems don't need to care about. A standard set of fields I try to surface, most of them which can easily be achieved with Dynamic Contextual Values from the go-kit log package:
- caller: https://godoc.org/github.com/go-kit/kit/log#Caller
- timestamp: https://godoc.org/github.com/go-kit/kit/log#pkg-variables
- revision: compile flag: https://github.com/tendermint/tendermint/blob/master/Makefile#L10
- job/system:
tendermintin our case - cmd/task:
nodefor the node command - service/subsystem: what we pass to
cmn.NewBaseService(...)e.g.:privValidatorSocketServer - err: set if error present
types/priv_validator/socket.go
Outdated
| if !pvss.IsRunning() { | ||
| return // Ignore error from listener closing. | ||
| } | ||
| if err := conn.SetDeadline(time.Now().Add(time.Second)); err != nil { |
There was a problem hiding this comment.
deadline should be configurable or be a const, no magic numbers please :)
| } | ||
| if err := conn.SetDeadline(time.Now().Add(time.Second)); err != nil { | ||
| pvss.Logger.Error( | ||
| "acceptConnetions", |
| req, err := readMsg(conn) | ||
| if err != nil { | ||
| if err != io.EOF { | ||
| pvss.Logger.Error("handleConnection", "err", err) |
* break out connect functionality out of OnStart * introduce max retries
As calls to the private validator can involve side-effects like network communication it is desirable for all methods returning an error to not break the control flow of the caller. * adjust PrivValidator interface
node/node.go
Outdated
|
|
||
| /* TODO | ||
| // Generate node PrivKey | ||
| privKey := crypto.GenPrivKeyEd25519() |
There was a problem hiding this comment.
I've been debating if this should be ephemeral.
In the ephemeral version, I imagine perhaps an ssh like confirm dialog in signatory before signatory starts accepting signing requests for a node.
if this passed in via a config file, then signatory would maintain a white list
ebuchman
left a comment
There was a problem hiding this comment.
I'll merge this into the other priv val branch, but we still need to fix the client vs server dichotomy (if the SocketServer is part of the Tendermint process, it shouldn't have direct access to a PrivValidator key)
| Moniker string `mapstructure:"moniker"` | ||
|
|
||
| // TCP or UNIX socket address of the PrivValidator server | ||
| PrivValidatorAddr string `mapstructure:"priv_validator_addr"` |
There was a problem hiding this comment.
Maybe we'll open an issue to break this out into its own PrivValidatorConfig
| require.Nil(err) | ||
|
|
||
| err = pvsc.SignHeartbeat(chainID, &types.Heartbeat{}) | ||
| require.Nil(err) |
There was a problem hiding this comment.
maybe we can verify the signatures too
| require.Nil(err) | ||
| } | ||
|
|
||
| func TestPrivValidatorSocketServerWithoutSecret(t *testing.T) { |
There was a problem hiding this comment.
maybe we can de-duplicate most of this
|
|
||
| retries-- | ||
|
|
||
| conn, err := cmn.Connect(pvsc.SocketAddress) |
There was a problem hiding this comment.
do we have to cleanup this conn if we error and retry?
There was a problem hiding this comment.
If it errors we don't need to worry about any cleanup, we do need to defer the closing of it if it succeeds.
There was a problem hiding this comment.
If this one errors, sure nothing to clean, but what about any of the points below where we continue. Eg. if the SecretConnection fails, I think we need to close the conn
|
|
||
| privVal := priv_val.LoadPrivValidatorJSON(*privValPath) | ||
|
|
||
| pvss := priv_val.NewPrivValidatorSocketServer( |
There was a problem hiding this comment.
So long as "client" means "one who initiates connection by dialing", I think we need the client here.
| logger log.Logger, | ||
| chainID, socketAddr string, | ||
| maxConnections int, | ||
| privVal PrivValidator, |
There was a problem hiding this comment.
if SocketServer is what runs in the Tendermint process (ie. it's listening on the socket for a connection from a priv_val_client, but it's also the one that initiates all requests), then it won't have access to a PrivValidator because it doesn't have a key for it - it only gets access to the key by sending messages over the socket
Rebased #1060 on #1203.
Note the integrations for this in node/node.go are commented out (until types/priv_validator becomes active).