Skip to content

Feature/priv val sockets#1204

Merged
ebuchman merged 18 commits intofeature/priv_valfrom
feature/priv_val_sockets
Feb 19, 2018
Merged

Feature/priv val sockets#1204
ebuchman merged 18 commits intofeature/priv_valfrom
feature/priv_val_sockets

Conversation

@ebuchman
Copy link
Contributor

@ebuchman ebuchman commented Feb 9, 2018

Rebased #1060 on #1203.

Note the integrations for this in node/node.go are commented out (until types/priv_validator becomes active).

@ebuchman ebuchman requested a review from melekes as a code owner February 9, 2018 22:05
@ebuchman ebuchman mentioned this pull request Feb 9, 2018
@ebuchman
Copy link
Contributor Author

ebuchman commented Feb 9, 2018

#1060:

When working with the protocol between client and server I was dearly missing some form of communicating errors, but I don't know how settle we are on the current protocol.

We can modify the protocol here to support error handling. Want to propose something?

@ebuchman
Copy link
Contributor Author

ebuchman commented Feb 9, 2018

Also, looks like Tendermint is responsible for dialing the signer process here. I thought we wanted the signer to dial Tendermint (#1060 (comment)). Should we make the SocketServer implement PrivKey and give Tendermint that or is there some better reason this should stay as is?

@codecov-io
Copy link

codecov-io commented Feb 9, 2018

Codecov Report

❗ No coverage uploaded for pull request base (feature/priv_val@bef91ea). Click here to learn what that means.
The diff coverage is 49.52%.

@@                 Coverage Diff                 @@
##             feature/priv_val    #1204   +/-   ##
===================================================
  Coverage                    ?   59.47%           
===================================================
  Files                       ?      126           
  Lines                       ?    11517           
  Branches                    ?        0           
===================================================
  Hits                        ?     6850           
  Misses                      ?     4019           
  Partials                    ?      648

func main() {
var (
chainID = flag.String("chain-id", "mychain", "chain id")
numClients = flag.Int("clients", 1, "number of concurrently connected clients")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how am I supposed to know the number of clients beforehand?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or this is a maximum number of concurrent clients?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will make the description more clear.

)
flag.Parse()

logger.Info("Reading args privValidatorSocketServer", "chainID", *chainID, "privPath", *privValPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change to fmt.Sprintf("Starting private validator socket server on %s", listenAddr), "chainID", *chainID, "privValPath", *privValPath, ?

chainID = flag.String("chain-id", "mychain", "chain id")
numClients = flag.Int("clients", 1, "number of concurrently connected clients")
privValPath = flag.String("priv", "", "priv val file path")
socketAddr = flag.String("socket.addr", ":46659", "socket bind addr")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep a single standard (listenAddr = flag.String("laddr")

	cmd.Flags().String("p2p.laddr", config.P2P.ListenAddress, "Node listen address. (0.0.0.0:0 means any interface, any port)")

cmd.Flags().String("moniker", config.Moniker, "Node Name")

// priv val flags
cmd.Flags().String("priv_validator_addr", config.PrivValidatorAddr, "Socket address for PrivValidator")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private validator, not PrivValidator


var err error
var conn net.Conn
var (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract this into a separate connect function?

}

// PubKey implements PrivValidator.
func (pvsc *PrivValidatorSocketClient) PubKey() crypto.PubKey {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not PubKey return error too by analogy with SignProposal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought so too and addressed it in 2a292ef, but would like to have a sanity check from @ebuchman on this. :>

}
pvss.Logger.Error("Failed to accept connection: " + err.Error())
pvss.Logger.Error(
"accpetConnections",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Failed to accept new connection"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I see the point given the signature of Error(msg string, keyvals ...interface{}) it seems very hard to build up the context the error was coming from as it would result in:

E[02-13|17:42:57.620] Failed to accept new connection                                      err=foo

Maybe I'm not seeing an important aspect of error logging in our codeabse, but that line has no signal in it. Happy to incorporate your comments, just wondering if it is the best way to go abouut it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can turn on a tracing logger by providing --trace flag https://github.com/tendermint/tendermint/blob/master/cmd/tendermint/commands/root.go#L53

In this mode logger will output full trace if error supports this feature (pkg/errors does). https://github.com/tendermint/tmlibs/blob/master/log/tracing_logger.go#L58

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! This still doesn't really help after the fact, which makes errors emitted without tracing still not very useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I see your point. So the agreement always was: meaningful-message key=value. For example: "Failed to accept connection err=EOF". We're still in a process of improving the messages. Plus we started to use pkg/errors not so long ago to accumulate context.

Usually, what we do is just grep by msg ("Failed to accept connection"). Of course, if you have many places in your app where you accept a connection, then msg must be changed to something like "Failed to accept connection from X".

I am happy to hear any suggestions on how to improve the current scheme.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sound approach and I don't mean to start a greater discussion if we already iterating on error handling in other places.

I am happy to hear any suggestions on how to improve the current scheme.

Usually thinking about in what remote ends logs can end up in typical infrastructure I try to imagine what vital info is needed to make sense of it when consumed in some log infrastructure remote from the origin. Additionally how each log line can interpreted by itself without context of the lines before or after. As the logger is passed from the outside to the inside in the program we can enrich the keyvals with information as well which subsystems don't need to care about. A standard set of fields I try to surface, most of them which can easily be achieved with Dynamic Contextual Values from the go-kit log package:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

if !pvss.IsRunning() {
return // Ignore error from listener closing.
}
if err := conn.SetDeadline(time.Now().Add(time.Second)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deadline should be configurable or be a const, no magic numbers please :)

}
if err := conn.SetDeadline(time.Now().Add(time.Second)); err != nil {
pvss.Logger.Error(
"acceptConnetions",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

req, err := readMsg(conn)
if err != nil {
if err != io.EOF {
pvss.Logger.Error("handleConnection", "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@xla xla self-assigned this Feb 12, 2018
xla added 5 commits February 13, 2018 19:08
* break out connect functionality out of OnStart
* introduce max retries
As calls to the private validator can involve side-effects like network
communication it is desirable for all methods returning an error to not
break the control flow of the caller.

* adjust PrivValidator interface
node/node.go Outdated

/* TODO
// Generate node PrivKey
privKey := crypto.GenPrivKeyEd25519()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been debating if this should be ephemeral.

In the ephemeral version, I imagine perhaps an ssh like confirm dialog in signatory before signatory starts accepting signing requests for a node.

if this passed in via a config file, then signatory would maintain a white list

Copy link
Contributor Author

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this into the other priv val branch, but we still need to fix the client vs server dichotomy (if the SocketServer is part of the Tendermint process, it shouldn't have direct access to a PrivValidator key)

Moniker string `mapstructure:"moniker"`

// TCP or UNIX socket address of the PrivValidator server
PrivValidatorAddr string `mapstructure:"priv_validator_addr"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we'll open an issue to break this out into its own PrivValidatorConfig

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1231

require.Nil(err)

err = pvsc.SignHeartbeat(chainID, &types.Heartbeat{})
require.Nil(err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can verify the signatures too

require.Nil(err)
}

func TestPrivValidatorSocketServerWithoutSecret(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can de-duplicate most of this


retries--

conn, err := cmn.Connect(pvsc.SocketAddress)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to cleanup this conn if we error and retry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it errors we don't need to worry about any cleanup, we do need to defer the closing of it if it succeeds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this one errors, sure nothing to clean, but what about any of the points below where we continue. Eg. if the SecretConnection fails, I think we need to close the conn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely!


privVal := priv_val.LoadPrivValidatorJSON(*privValPath)

pvss := priv_val.NewPrivValidatorSocketServer(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long as "client" means "one who initiates connection by dialing", I think we need the client here.

logger log.Logger,
chainID, socketAddr string,
maxConnections int,
privVal PrivValidator,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if SocketServer is what runs in the Tendermint process (ie. it's listening on the socket for a connection from a priv_val_client, but it's also the one that initiates all requests), then it won't have access to a PrivValidator because it doesn't have a key for it - it only gets access to the key by sending messages over the socket

@ebuchman ebuchman merged commit ffd2483 into feature/priv_val Feb 19, 2018
@ebuchman ebuchman deleted the feature/priv_val_sockets branch February 19, 2018 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants