Skip to content

server,gossip: Store generated join tokens in gossip#61512

Closed
itsbilal wants to merge 1 commit intocockroachdb:masterfrom
itsbilal:gossip-join-token
Closed

server,gossip: Store generated join tokens in gossip#61512
itsbilal wants to merge 1 commit intocockroachdb:masterfrom
itsbilal:gossip-join-token

Conversation

@itsbilal
Copy link
Copy Markdown
Contributor

@itsbilal itsbilal commented Mar 4, 2021

When a join token is generated, gossip it among other connected
nodes with a short TTL. This allows that join token to be used
when connecting to any node in the cluster, and not necessarily
to the same node where the join token was generated.

Also add an isValid function to the join token to confirm (on
the receiver end) that the join token matches what was stored
in gossip.

First commit is #61505. Part of #60632.

Release justification: Changes to new code paths, gated behind
experimental feature flag.
Release note: None.

@itsbilal itsbilal requested review from aaron-crl and knz March 4, 2021 22:14
@itsbilal itsbilal requested a review from a team as a code owner March 4, 2021 22:14
@itsbilal itsbilal self-assigned this Mar 4, 2021
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)


pkg/server/join_token.go, line 102 at r1 (raw file):

	var b bytes.Buffer
	encoder := base64.NewEncoder(base64.URLEncoding, &b)

Did we check that base64 was a friendly encoding?


pkg/server/status.go, line 2563 at r1 (raw file):

	certLocator := security.MakeCertsLocator(s.cfg.SSLCertsDir)
	if ok, err := certLocator.HasNodeCert(); err != nil || !ok {

Should be able to just try to read the cert using something like this:

	cl := security.MakeCertsLocator(s.server.cfg.SSLCertsDir)
	caCert, err := loadCertificateFile(cl.CACertPath())
	if err != nil {
		return nil, errors.Wrapf(
			err,
			"failed to read inter-node cert from disk at %q ",
			cl.CACertPath(),
		)
	}

Copy link
Copy Markdown

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)


pkg/server/join_token.go, line 125 at r2 (raw file):

		return false, err
	}
	return bytes.Equal(token, token2), nil

This should use a ConstantTimeCompare to avoid timing attacks. Feel free to drop a TODO(aaron-crl) for me to pick up here if you don't have time for it today.

@itsbilal itsbilal force-pushed the gossip-join-token branch 4 times, most recently from 7d4665c to fde2058 Compare March 4, 2021 23:47
@itsbilal
Copy link
Copy Markdown
Contributor Author

itsbilal commented Mar 4, 2021


pkg/server/join_token.go, line 125 at r2 (raw file):

Previously, aaron-crl wrote…

This should use a ConstantTimeCompare to avoid timing attacks. Feel free to drop a TODO(aaron-crl) for me to pick up here if you don't have time for it today.

TODO added, thanks!

aaron-crl pushed a commit to aaron-crl/cockroach that referenced this pull request Mar 5, 2021
* removed stubs and dummy functions
* cleaned up TODOs

Release justification: low risk, feature flagged functional enhancement
Release note: None
@itsbilal itsbilal force-pushed the gossip-join-token branch from fde2058 to 68c673b Compare March 5, 2021 14:26
@itsbilal
Copy link
Copy Markdown
Contributor Author

itsbilal commented Mar 5, 2021

Rebased on master, this is ready for a look by someone more familiar with gossip

@itsbilal itsbilal force-pushed the gossip-join-token branch from 68c673b to 6bb6309 Compare March 5, 2021 14:49
@aaron-crl aaron-crl requested review from bdarnell and tbg March 5, 2021 14:57
Copy link
Copy Markdown

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks alright to me but we'll need someone who knows gossip to sign off on it before it's merged.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @itsbilal, @knz, and @tbg)

aaron-crl pushed a commit to aaron-crl/cockroach that referenced this pull request Mar 5, 2021
* removed stubs and dummy functions
* cleaned up TODOs

Release justification: low risk, feature flagged functional enhancement
Release note: None
Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of Gossip is fine, though I can't yet see how the use looks end-to-end because it seems like this PR is just partial work. Gossiping a join token with a 10 minute TTL seems fine though from the Gossip perspective.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @itsbilal, and @knz)


pkg/server/join_token.go, line 143 at r3 (raw file):

// Checks the join token in the gossip store matches the marshalled form of
// the passed-in join token.
func (j *joinToken) isValid(g *gossip.Gossip) (bool, error) {

It seems a bit weird to pass *Gossip in here. Why not take a []byte to avoid the tight coupling?

No testing?

@itsbilal itsbilal force-pushed the gossip-join-token branch from 6bb6309 to 55e3031 Compare March 5, 2021 16:53
@itsbilal
Copy link
Copy Markdown
Contributor Author

itsbilal commented Mar 5, 2021


pkg/server/join_token.go, line 143 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

It seems a bit weird to pass *Gossip in here. Why not take a []byte to avoid the tight coupling?

No testing?

Thanks for the review! I added the test right after you wrote this.

Also the only reason to throw the gossip client in here is to get it done in this PR (or at least commit) and so @aaron-crl has it ready to work off of. I can see []byte making more sense. Maybe we shouldn't be merging this until Aaron's RPC method that uses this function is ready.

Copy link
Copy Markdown

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @itsbilal, @knz, and @tbg)


pkg/server/join_token.go, line 143 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Thanks for the review! I added the test right after you wrote this.

Also the only reason to throw the gossip client in here is to get it done in this PR (or at least commit) and so @aaron-crl has it ready to work off of. I can see []byte making more sense. Maybe we shouldn't be merging this until Aaron's RPC method that uses this function is ready.

Now that I've seen this pattern I think I can manage the gossip in my PR where it's aligned with the functionality. I'll get someone to review that next week along with the rest of the rest of those changes. We can probably drop this function altogether since without gossip here, it just becomes a Marshal+Equal.

Copy link
Copy Markdown

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good perspective, thanks @tbg !

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @itsbilal, @knz, and @tbg)

Copy link
Copy Markdown
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the additional pieces are yet to come. We've done a pretty fine-grained chunking up of work to increase parallelism of work.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, @knz, and @tbg)


pkg/server/join_token.go, line 102 at r1 (raw file):

Previously, aaron-crl wrote…

Did we check that base64 was a friendly encoding?

Docs say this is URL-safe so I assume yes


pkg/server/join_token.go, line 143 at r3 (raw file):

Previously, aaron-crl wrote…

Now that I've seen this pattern I think I can manage the gossip in my PR where it's aligned with the functionality. I'll get someone to review that next week along with the rest of the rest of those changes. We can probably drop this function altogether since without gossip here, it just becomes a Marshal+Equal.

Alright, I'll drop this function and keep this PR to just the change to the function in status.go.


pkg/server/status.go, line 2563 at r1 (raw file):

Previously, aaron-crl wrote…

Should be able to just try to read the cert using something like this:

	cl := security.MakeCertsLocator(s.server.cfg.SSLCertsDir)
	caCert, err := loadCertificateFile(cl.CACertPath())
	if err != nil {
		return nil, errors.Wrapf(
			err,
			"failed to read inter-node cert from disk at %q ",
			cl.CACertPath(),
		)
	}

Done.

When a join token is generated, gossip it among other connected
nodes with a short TTL. This allows that join token to be used
when connecting to any node in the cluster, and not necessarily
to the same node where the join token was generated.

Part of cockroachdb#60632.

Release justification: Changes to new code paths, gated behind
experimental feature flag.
Release note: None.
@itsbilal itsbilal force-pushed the gossip-join-token branch from 55e3031 to a57cdfb Compare March 5, 2021 20:21
Copy link
Copy Markdown

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, @itsbilal, @knz, and @tbg)


pkg/server/join_token.go, line 102 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Docs say this is URL-safe so I assume yes

Testing locally "foobar!" => "Zm9vYmFyIQ=="

Double-clicking the encoded string gives me all but the padding characters which is not cut/paste friendly. Since we control both ends of this we can probably just trim the padding and then anticipate decoding without it in the unmarshaling function.

@itsbilal
Copy link
Copy Markdown
Contributor Author

itsbilal commented Mar 5, 2021

Based on a Slack discussion, we're going to can this PR as we need stricter at-most-once semantics than were expected at first. An eventually consistent best-effort once wouldn't do.

@itsbilal itsbilal closed this Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants