Skip to content

rfc: token-based authentication for SQL session revival#74640

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
rafiss:rfc-session-revival-token
Jan 11, 2022
Merged

rfc: token-based authentication for SQL session revival#74640
craig[bot] merged 1 commit intocockroachdb:masterfrom
rafiss:rfc-session-revival-token

Conversation

@rafiss
Copy link
Copy Markdown
Collaborator

@rafiss rafiss commented Jan 10, 2022

Release note: None

@rafiss rafiss requested a review from a team as a code owner January 10, 2022 18:57
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly :lgtm: since we've already discussed all the concerns on the initial Google Docs draft.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajstorm, @catj-cockroach, @knz, and @rafiss)


docs/RFCS/20211203_session_revival_token.md, line 92 at r1 (raw file):

#### Update pgwire code to look out for the new token

We will add a CockroachDB-specific “StartupMessage” in the [pgwire/server.go](https://github.com/cockroachdb/cockroach/blob/b4ab627436afd8d9ed23330cd4026aa435b5b65d/pkg/sql/pgwire/server.go#L731) code. We’ll name it **<code>session_revival_token</code></strong>.

nit: formatting error here

Code quote:

We’ll name it **<code>session_revival_token</code></strong>

docs/RFCS/20211203_session_revival_token.md, line 102 at r1 (raw file):

#### In order to sign the token, we will introduce a new signing key (either [Ed25519](https://pkg.go.dev/crypto/ed25519) or RSA depending on what the Intrusion team decides) that is unique per tenant, and is shared by all SQL nodes of the same tenant. The advantage of Ed25519 is that it is fast and generates small signatures. The signing key will be generated by the Vault. Unlike the existing tenant-client cert, this new signing cert doesn't require any DNS or IP information baked into the certificate and the certificate can be self-signed. This makes it easier to rotate if a tenant pod becomes compromised. When creating a tenant SQL node, Intrusion will need to put the key into the `certs-dir` of the SQL node (similar to how the tenant certs are added now). The cert will be named `tenant-signing.&lt;ID>.key`. The `crdb_internal.create_session_token` function will fail if this key is not present, and the token creation will be able to detect the algorithm used by the cert and call the according signature methods. A `cockroach mt cert create-tenant-signing` command will be added for testing purposes.

nit: formatting error. Don't think the #### is intentional


docs/RFCS/20211203_session_revival_token.md, line 102 at r1 (raw file):

#### In order to sign the token, we will introduce a new signing key (either [Ed25519](https://pkg.go.dev/crypto/ed25519) or RSA depending on what the Intrusion team decides) that is unique per tenant, and is shared by all SQL nodes of the same tenant. The advantage of Ed25519 is that it is fast and generates small signatures. The signing key will be generated by the Vault. Unlike the existing tenant-client cert, this new signing cert doesn't require any DNS or IP information baked into the certificate and the certificate can be self-signed. This makes it easier to rotate if a tenant pod becomes compromised. When creating a tenant SQL node, Intrusion will need to put the key into the `certs-dir` of the SQL node (similar to how the tenant certs are added now). The cert will be named `tenant-signing.&lt;ID>.key`. The `crdb_internal.create_session_token` function will fail if this key is not present, and the token creation will be able to detect the algorithm used by the cert and call the according signature methods. A `cockroach mt cert create-tenant-signing` command will be added for testing purposes.

nit: tenant-signing..key

Code quote:

tenant-signing.&lt;ID>.key

docs/RFCS/20211203_session_revival_token.md, line 116 at r1 (raw file):

#### Configuration in the sqlServer

Since the mechanism should only be available in CRDB-Serverless deployments, we will make it configurable so that it is not enabled by default on dedicated or self-hosted clusters. A new field will be added to `sqlServerArgs` named **<code>sessionRevivalTokenEnabled</code></strong> which defaults to false, and it will control if the SQL node looks for the <strong><code>session_revival_token</code></strong> parameter. The field will be set to true only in the <code>server.StartTenant</code> function, so the functionality only is enabled in multitenant clusters.

nit: Not sure if the ** is intentional

Code quote:

**<code>sessionRevivalTokenEnabled</code></strong>

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo the formatting fixes

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @ajstorm, @catj-cockroach, and @rafiss)


docs/RFCS/20211203_session_revival_token.md, line 5 at r1 (raw file):

- Start Date: 2021-12-03
- Authors: Rafi Shamim
- RFC PR: (PR # after acceptance of initial draft)

you can update the metadata now


docs/RFCS/20211203_session_revival_token.md, line 6 at r1 (raw file):

- Authors: Rafi Shamim
- RFC PR: (PR # after acceptance of initial draft)
- Cockroach Issue: (one or more # from the issue tracker)

we need to file an issue retroactively in order to have a tracking item in milestones


docs/RFCS/20211203_session_revival_token.md, line 94 at r1 (raw file):

We will add a CockroachDB-specific “StartupMessage” in the [pgwire/server.go](https://github.com/cockroachdb/cockroach/blob/b4ab627436afd8d9ed23330cd4026aa435b5b65d/pkg/sql/pgwire/server.go#L731) code. We’ll name it **<code>session_revival_token</code></strong>.

Before reaching the password authentication code, the server will see the token, and if it’s valid, will bypass the regular session initialization and [authentication code](https://github.com/cockroachdb/cockroach/blob/b4ab627436afd8d9ed23330cd4026aa435b5b65d/pkg/sql/pgwire/auth_methods.go#L40).

are you sure about this line reference?

Copy link
Copy Markdown
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @ajstorm, @catj-cockroach, and @knz)


docs/RFCS/20211203_session_revival_token.md, line 5 at r1 (raw file):

Previously, knz (kena) wrote…

you can update the metadata now

done


docs/RFCS/20211203_session_revival_token.md, line 6 at r1 (raw file):

Previously, knz (kena) wrote…

we need to file an issue retroactively in order to have a tracking item in milestones

for sure. filed #74643


docs/RFCS/20211203_session_revival_token.md, line 92 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

nit: formatting error here

done


docs/RFCS/20211203_session_revival_token.md, line 94 at r1 (raw file):

Previously, knz (kena) wrote…

are you sure about this line reference?

nice catch, fixed


docs/RFCS/20211203_session_revival_token.md, line 102 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

nit: formatting error. Don't think the #### is intentional

done


docs/RFCS/20211203_session_revival_token.md, line 102 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

nit: tenant-signing..key

done


docs/RFCS/20211203_session_revival_token.md, line 116 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

nit: Not sure if the ** is intentional

done

@rafiss rafiss force-pushed the rfc-session-revival-token branch from 6cb13b7 to 54efd4b Compare January 10, 2022 20:07
Copy link
Copy Markdown
Contributor

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajstorm, @catj-cockroach, @knz, and @rafiss)


docs/RFCS/20211203_session_revival_token.md, line 102 at r2 (raw file):

In order to sign the token, we will introduce a new signing key (either [Ed25519](https://pkg.go.dev/crypto/ed25519) or RSA depending on what the Intrusion team decides) that is unique per tenant, and is shared by all SQL nodes of the same tenant. The advantage of Ed25519 is that it is fast and generates small signatures. The signing key will be generated by the Vault. Unlike the existing tenant-client cert, this new signing cert doesn't require any DNS or IP information baked into the certificate and the certificate can be self-signed. This makes it easier to rotate if a tenant pod becomes compromised. When creating a tenant SQL node, Intrusion will need to put the key into the `certs-dir` of the SQL node (similar to how the tenant certs are added now). The cert will be named `tenant-signing.<ID>.key`. The `crdb_internal.create_session_token` function will fail if this key is not present, and the token creation will be able to detect the algorithm used by the cert and call the according signature methods. A `cockroach mt cert create-tenant-signing` command will be added for testing purposes.

Just talked to @catj-cockroach, it's likely that we'll be using a self-signed cert here. Could we make this phrase generic so that we don't fix ourselves on an approach? If we're using a self-signed cert, the cert will be generated by intrusion instead of vault.

Code quote:

The signing key will be generated by the Vault

Copy link
Copy Markdown
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajstorm, @catj-cockroach, and @knz)


docs/RFCS/20211203_session_revival_token.md, line 102 at r2 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

Just talked to @catj-cockroach, it's likely that we'll be using a self-signed cert here. Could we make this phrase generic so that we don't fix ourselves on an approach? If we're using a self-signed cert, the cert will be generated by intrusion instead of vault.

sure, good to know! so does that mean you can easily use ed25519? (fyi, i started work on this: #74646)

@rafiss rafiss force-pushed the rfc-session-revival-token branch from 54efd4b to 069a007 Compare January 10, 2022 23:27
@rafiss
Copy link
Copy Markdown
Collaborator Author

rafiss commented Jan 10, 2022

thanks for your reviews!

bors=knz,jaylim-crl

Copy link
Copy Markdown
Contributor

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajstorm, @catj-cockroach, and @knz)


docs/RFCS/20211203_session_revival_token.md, line 102 at r2 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

sure, good to know! so does that mean you can easily use ed25519? (fyi, i started work on this: #74646)

Yes, that's correct. I'd imagine intrusion to have code that looks like the following:

// CreateTenantSigningPair creates a tenant signing pair. The private key and
// public key are both created in certsDir.
func CreateTenantSigningPair(
certsDir string, lifetime time.Duration, overwrite bool, tenantID uint64,
) error {
if len(certsDir) == 0 {
return errors.New("the path to the certs directory is required")
}
if tenantID == 0 {
return errors.Errorf("tenantId %d is invalid (requires != 0)", tenantID)
}
tenantIdentifier := fmt.Sprintf("%d", tenantID)
// Create a certificate manager with "create dir if not exist".
cm, err := NewCertificateManagerFirstRun(certsDir, CommandTLSSettings{})
if err != nil {
return err
}
signingKeyPath := cm.TenantSigningKeyPath(tenantIdentifier)
signingCertPath := cm.TenantSigningCertPath(tenantIdentifier)
var pubKey crypto.PublicKey
var privKey crypto.PrivateKey
pubKey, privKey, err = ed25519.GenerateKey(rand.Reader)
if err != nil {
return errors.Wrap(err, "could not generate new tenant signing key")
}
if err := writeKeyToFile(signingKeyPath, privKey, overwrite); err != nil {
return errors.Wrapf(err, "could not write tenant signing key to file %s", signingKeyPath)
}
log.Infof(context.Background(), "generated tenant signing key %s", signingKeyPath)
// Generate certificate.
certContents, err := GenerateTenantSigningCert(pubKey, privKey, lifetime)
if err != nil {
return errors.Wrap(err, "could not generate tenant signing certificate")
}
certificates := []*pem.Block{{Type: "CERTIFICATE", Bytes: certContents}}
if err := WritePEMToFile(signingCertPath, certFileMode, overwrite, certificates...); err != nil {
return errors.Wrapf(err, "could not write tenant signing certificate file %s", signingCertPath)
}
log.Infof(context.Background(), "wrote certificate to %s", signingCertPath)
return nil
}

We could technically rely on vault to generate those certs for us, but we'd have to update vault, and based on conversations with @catj-cockroach, it appears that there's no huge benefit to using vault here. Cert rotation for signing certs would just be to regenerate the keypairs for the tenant that got compromised.

@rafiss
Copy link
Copy Markdown
Collaborator Author

rafiss commented Jan 11, 2022

nice to hear that!

bors r=knz,jaylim-crl

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 11, 2022

Build succeeded:

@craig craig bot merged commit fdd672f into cockroachdb:master Jan 11, 2022
Copy link
Copy Markdown
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)


docs/RFCS/20211203_session_revival_token.md, line 87 at r3 (raw file):

One may notice that the structure of the token is similar to a [JSON Web Token](https://datatracker.ietf.org/doc/html/rfc7519#section-3.1), with some differences. For example, we use protocol buffers so that new fields can easily be added and remain backwards compatible, and we do not base64-encode the result.

Signing protos can be tricky. There is no formal guarantee that the outcome is deterministic, and this gets especially complicated across different versions of the .proto file. It is more robust to change the first field to bytes payload = 1 so that the SessionToken message is always unambiguous about the exact bytes that were signed. And then the recipient of this message can decode the payload bytes into a SessionToken.Payload message object.


docs/RFCS/20211203_session_revival_token.md, line 102 at r3 (raw file):

In order to sign the token, we will introduce a new signing key (either [Ed25519](https://pkg.go.dev/crypto/ed25519) or RSA depending on what the Intrusion team decides) that is unique per tenant, and is shared by all SQL nodes of the same tenant. The advantage of Ed25519 is that it is fast and generates small signatures. The signing key will be generated by the Intrusion team. Unlike the existing tenant-client cert, this new signing cert doesn't require any DNS or IP information baked into the certificate and the certificate can be self-signed. This makes it easier to rotate if a tenant pod becomes compromised. When creating a tenant SQL node, Intrusion will need to put the key into the `certs-dir` of the SQL node (similar to how the tenant certs are added now). The cert will be named `tenant-signing.<ID>.key`. The `crdb_internal.create_session_token` function will fail if this key is not present, and the token creation will be able to detect the algorithm used by the cert and call the according signature methods. A `cockroach mt cert create-tenant-signing` command will be added for testing purposes.

What is the ID in tenant-signing.<ID>.key?


docs/RFCS/20211203_session_revival_token.md, line 123 at r3 (raw file):

One drawback of this design is that it requires the token to be passed in as a CockroachDB-specific StartupMessage. This means that 3rd party drivers won’t be able to use this flow unless we fork them. Currently, that is not a use case we need to support, but if it does become one later, we can add in other ways for the SQL server to receive the token.

Another drawback is that it needs to handle the case where the signing key is rotated or revoked. In the case of rotation, we can introduce a grace period so that multiple signing keys are present for a 10-minute grace period. Every signature verification attempt needs to first try the new key, and then try the old key if that fails. If the signing key is **revoked**, then all tokens signed with that key will become invalid.

How are the multiple keys represented on the filesystem?

Instead of trying multiple keys, it would probably be better if the SessionToken message included a key ID or fingerprint.

@rafiss rafiss deleted the rfc-session-revival-token branch January 13, 2022 15:13
Copy link
Copy Markdown
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)


docs/RFCS/20211203_session_revival_token.md, line 87 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Signing protos can be tricky. There is no formal guarantee that the outcome is deterministic, and this gets especially complicated across different versions of the .proto file. It is more robust to change the first field to bytes payload = 1 so that the SessionToken message is always unambiguous about the exact bytes that were signed. And then the recipient of this message can decode the payload bytes into a SessionToken.Payload message object.

oh, very good to know.


docs/RFCS/20211203_session_revival_token.md, line 102 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What is the ID in tenant-signing.<ID>.key?

the tenant ID. the naming convention is what the client-tenant.<ID>.crt file uses too. i'll clarify the ID


docs/RFCS/20211203_session_revival_token.md, line 123 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

How are the multiple keys represented on the filesystem?

Instead of trying multiple keys, it would probably be better if the SessionToken message included a key ID or fingerprint.

do you know a conventional way to get a key ID/fingerprint? like could i just generate a random SerialNumber for the x509.Certificate struct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants