Skip to content

libroach: add data key manager#21150

Merged
mberhault merged 1 commit intocockroachdb:masterfrom
mberhault:marc/add_data_key_manager
Jan 4, 2018
Merged

libroach: add data key manager#21150
mberhault merged 1 commit intocockroachdb:masterfrom
mberhault:marc/add_data_key_manager

Conversation

@mberhault
Copy link
Copy Markdown
Contributor

@mberhault mberhault commented Jan 2, 2018

Release note: None

Part of encryption at rest: see #19783

The data key manager persists its keys in a local file and generates a
new key when the active store key has changed or if the current key is
too old.

Reworked the key storage to distinguish actual keys (SecretKey) from
just the key information (KeyInfo). The KeyInfo should be passed most of
the time, with CurrentKey and GetKey possibly going away in favor of
a SetKeyForCipher method in the future.

@mberhault mberhault requested review from a team January 2, 2018 13:25
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@mberhault mberhault force-pushed the marc/add_data_key_manager branch from 6e07dc7 to 7a3436b Compare January 2, 2018 13:27
@mberhault mberhault mentioned this pull request Jan 2, 2018
29 tasks
@mberhault mberhault requested a review from bdarnell January 2, 2018 13:29
@mberhault mberhault force-pushed the marc/add_data_key_manager branch from 7a3436b to 14ee156 Compare January 2, 2018 16:02
key->key = contents;
key->source = path;
key->set_key(contents);
info->set_key_id(KeyHash(contents));
Copy link
Copy Markdown
Contributor Author

@mberhault mberhault Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is iffy. The key IDs are meant to be stored in the clear in the file registry. Using plain hashes makes it too easy to do lookup table attacks against key hashes.
We have some options here, some simpler than others:

  • use the first 256 bits of the hash for the ID, the number of needed bytes after that for the key. We would need to switch to SHA-512 (not much of an issue).
  • use salts per key. We only have two key IDs to check against anyway. Where the key ID is either the ciphertext of the salt encrypted with the key (key check value) or the key run through a KDF (probably not needed as we don't need to add enthropy)
  • require key files to be <key ID length> + <key length> and use the first <key ID length> bytes as a key ID. We can shrink to 64bit key IDs (256 is overkill). Would this be odd? Since the near future should have support for things other than files, it may be acceptable for now.

For the data keys, we don't need to match duplicate keys so we can simply generate a random ID as well. We need to decide if we care to check for duplicates. In the unlikely chance that it happens (we're screwed anyway as we probably have a weak prng) we'll lose the previous key with the same ID.

@mberhault mberhault force-pushed the marc/add_data_key_manager branch 2 times, most recently from 1c80304 to 1e8aeed Compare January 2, 2018 20:43
@bdarnell
Copy link
Copy Markdown
Contributor

bdarnell commented Jan 3, 2018

Reviewed 26 of 32 files at r1, 6 of 6 files at r2.
Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


c-deps/libroach/ccl/key_manager.cc, line 61 at r1 (raw file):

Previously, mberhault (marc) wrote…

this is iffy. The key IDs are meant to be stored in the clear in the file registry. Using plain hashes makes it too easy to do lookup table attacks against key hashes.
We have some options here, some simpler than others:

  • use the first 256 bits of the hash for the ID, the number of needed bytes after that for the key. We would need to switch to SHA-512 (not much of an issue).
  • use salts per key. We only have two key IDs to check against anyway. Where the key ID is either the ciphertext of the salt encrypted with the key (key check value) or the key run through a KDF (probably not needed as we don't need to add enthropy)
  • require key files to be <key ID length> + <key length> and use the first <key ID length> bytes as a key ID. We can shrink to 64bit key IDs (256 is overkill). Would this be odd? Since the near future should have support for things other than files, it may be acceptable for now.

For the data keys, we don't need to match duplicate keys so we can simply generate a random ID as well. We need to decide if we care to check for duplicates. In the unlikely chance that it happens (we're screwed anyway as we probably have a weak prng) we'll lose the previous key with the same ID.

Lookup tables aren't really a concern here - you can't enumerate all possible keys to build one. Still, exposing a hash of our keys seems like something best avoided.

  1. Hashing a key to generate an "id" and a new key seems like a bad idea.
  2. I don't understand what you're referring to here. Where is this "salt" coming from? Salts are normally stored in plaintext so I'm not sure how this helps.
  3. OK, this makes more sense now than when we were talking about it yesterday. This is a subset of "ask the user to provide a key id", which they could also do via the file name (but asking for more random bytes makes it less likely they'll do things like reuse filenames inappropriately)

For data keys, we should just generate random IDs instead of hashing the value. There's no reason that the id be derivable from the value. Use 128 bits (or a uuid, which throws away 6 bits but is still plenty of entropy) and don't worry about checking for duplicates.

For store keys, is the assumption that the filenames will change on every key rotation, or that new keys will be swapped in to the current filenames? If the former, I think the key id should probably be a combination of the filename and a truncated (32 bit?) hash of the key value (the hash would be enough to detect mistakes in which the key was modified without renaming, without providing a useful amount of information about the key even if there were some other problem with exposing the hash).

If the filenames are expected to be reused, we'll probably want to either use a hash of the key value or ask the user for something we can use as a key. I think the latter is probably better; I expect that key management tools are likely to use some form of explicit key identifier instead of a hash of the value.


pkg/ccl/storageccl/engineccl/enginepbccl/key_registry.proto, line 17 at r2 (raw file):

// file_registry.proto defines the following:
// enum EncryptionType {

Don't repeat this here; it'll just rot. (or move the whole thing here; see comment below)


pkg/ccl/storageccl/engineccl/enginepbccl/key_registry.proto, line 28 at r2 (raw file):

message DataKeysRegistry {
  // We only store the information about store keys, not the raw key itself.
  map<string, KeyInfo> store_keys = 1;

What is the key for this map and the following one? A copy of KeyInfo.key_id?


pkg/storage/engine/enginepb/file_registry.proto, line 32 at r2 (raw file):

}

enum EncryptionType {

Putting these constants in OSS isn't great. There's obviously not any real IP concerns here but I think it would be cleaner if they lived on the CCL side (even if that meant the type field below became an untyped integer instead of an enum).


Comments from Reviewable

@mberhault
Copy link
Copy Markdown
Contributor Author

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


c-deps/libroach/ccl/key_manager.cc, line 61 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Lookup tables aren't really a concern here - you can't enumerate all possible keys to build one. Still, exposing a hash of our keys seems like something best avoided.

  1. Hashing a key to generate an "id" and a new key seems like a bad idea.
  2. I don't understand what you're referring to here. Where is this "salt" coming from? Salts are normally stored in plaintext so I'm not sure how this helps.
  3. OK, this makes more sense now than when we were talking about it yesterday. This is a subset of "ask the user to provide a key id", which they could also do via the file name (but asking for more random bytes makes it less likely they'll do things like reuse filenames inappropriately)

For data keys, we should just generate random IDs instead of hashing the value. There's no reason that the id be derivable from the value. Use 128 bits (or a uuid, which throws away 6 bits but is still plenty of entropy) and don't worry about checking for duplicates.

For store keys, is the assumption that the filenames will change on every key rotation, or that new keys will be swapped in to the current filenames? If the former, I think the key id should probably be a combination of the filename and a truncated (32 bit?) hash of the key value (the hash would be enough to detect mistakes in which the key was modified without renaming, without providing a useful amount of information about the key even if there were some other problem with exposing the hash).

If the filenames are expected to be reused, we'll probably want to either use a hash of the key value or ask the user for something we can use as a key. I think the latter is probably better; I expect that key management tools are likely to use some form of explicit key identifier instead of a hash of the value.

surprisingly, 1. is pretty common (often to derive encryption and hmac keys from a single key). 2. would be a randomly-generated salt stored in plaintext alongside the ID. It would either be combined ala KDF, or encrypted using the key. This would only be there to prevent hash lookups. I'm using "salt" for lack of a better name. 3. has the issue we were talking about: people may be surprised by having to provide 24 bytes for a 128 bit key.

Data keys I've already moved over to a random ID (no reason not to) and am ignoring duplicates.

We have absolutely no control over what people will use for filenames. Additionally, if we want to allow signal-based store key reload, we can't change the filename without having a layer of indirection (we threw away the custom file a while back).

I'm happy with a hash for now, although I'm also happy asking for extra bits.


pkg/ccl/storageccl/engineccl/enginepbccl/key_registry.proto, line 17 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Don't repeat this here; it'll just rot. (or move the whole thing here; see comment below)

True. Let's decide where we want it.


pkg/ccl/storageccl/engineccl/enginepbccl/key_registry.proto, line 28 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What is the key for this map and the following one? A copy of KeyInfo.key_id?

Yup. It's a bit redundant, but not a huge deal. We never modify a key ID, and having it in the KeyInfo means we don't have to return two things (it mostly makes method signatures cleaner).


pkg/storage/engine/enginepb/file_registry.proto, line 32 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Putting these constants in OSS isn't great. There's obviously not any real IP concerns here but I think it would be cleaner if they lived on the CCL side (even if that meant the type field below became an untyped integer instead of an enum).

I could make them more oblivious, but at some point I'll need encryption info in there as well (IV/counter). I would prefer not to have to use serialized protos for that (or Any). For most of the OSS code, this is about the only knowledge of encryption we need. I we want to report per-algo/key-size utilization, it seems easier to have it in OSS code than try to pass blind strings everywhere.


Comments from Reviewable

@bdarnell
Copy link
Copy Markdown
Contributor

bdarnell commented Jan 4, 2018

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


c-deps/libroach/ccl/key_manager.cc, line 61 at r1 (raw file):

Previously, mberhault (marc) wrote…

surprisingly, 1. is pretty common (often to derive encryption and hmac keys from a single key). 2. would be a randomly-generated salt stored in plaintext alongside the ID. It would either be combined ala KDF, or encrypted using the key. This would only be there to prevent hash lookups. I'm using "salt" for lack of a better name. 3. has the issue we were talking about: people may be surprised by having to provide 24 bytes for a 128 bit key.

Data keys I've already moved over to a random ID (no reason not to) and am ignoring duplicates.

We have absolutely no control over what people will use for filenames. Additionally, if we want to allow signal-based store key reload, we can't change the filename without having a layer of indirection (we threw away the custom file a while back).

I'm happy with a hash for now, although I'm also happy asking for extra bits.

OK, let's ask for extra bits for now. We can revisit once we start looking at other key management systems.


pkg/ccl/storageccl/engineccl/enginepbccl/key_registry.proto, line 28 at r2 (raw file):

Previously, mberhault (marc) wrote…

Yup. It's a bit redundant, but not a huge deal. We never modify a key ID, and having it in the KeyInfo means we don't have to return two things (it mostly makes method signatures cleaner).

I'm fine with the redundancy. I was basically asking for a comment to be added about the keys.


pkg/storage/engine/enginepb/file_registry.proto, line 32 at r2 (raw file):

Previously, mberhault (marc) wrote…

I could make them more oblivious, but at some point I'll need encryption info in there as well (IV/counter). I would prefer not to have to use serialized protos for that (or Any). For most of the OSS code, this is about the only knowledge of encryption we need. I we want to report per-algo/key-size utilization, it seems easier to have it in OSS code than try to pass blind strings everywhere.

Hmm. My preference would still be to try to keep everything knowledgeable about crypto details on the CCL side, but I don't feel strongly about proto definition stuff.


Comments from Reviewable

@mberhault mberhault force-pushed the marc/add_data_key_manager branch from 1e8aeed to 6c7825a Compare January 4, 2018 11:25
@mberhault
Copy link
Copy Markdown
Contributor Author

Review status: 12 of 27 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


c-deps/libroach/ccl/key_manager.cc, line 61 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

OK, let's ask for extra bits for now. We can revisit once we start looking at other key management systems.

Done. I'm using 32 bytes for key IDs. If I use shorter (8 or 16), the user may provide 24 bytes but end up with AES-128. 32 (AES-256 key size) makes that impossible (unless they think there's such a thing as AES-512, but then we can't help them).


pkg/ccl/storageccl/engineccl/enginepbccl/key_registry.proto, line 17 at r2 (raw file):

Previously, mberhault (marc) wrote…

True. Let's decide where we want it.

Done.


pkg/ccl/storageccl/engineccl/enginepbccl/key_registry.proto, line 28 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'm fine with the redundancy. I was basically asking for a comment to be added about the keys.

Done. Sorry about that. Added clarifying comments.


pkg/storage/engine/enginepb/file_registry.proto, line 32 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Hmm. My preference would still be to try to keep everything knowledgeable about crypto details on the CCL side, but I don't feel strongly about proto definition stuff.

Done. Moved to CCL. We can move it back later if needed.


Comments from Reviewable

@mberhault mberhault force-pushed the marc/add_data_key_manager branch 2 times, most recently from b06f6a8 to 1fd516a Compare January 4, 2018 12:24
@bdarnell
Copy link
Copy Markdown
Contributor

bdarnell commented Jan 4, 2018

Review status: 10 of 27 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


c-deps/libroach/ccl/key_manager.cc, line 61 at r1 (raw file):

Previously, mberhault (marc) wrote…

Done. I'm using 32 bytes for key IDs. If I use shorter (8 or 16), the user may provide 24 bytes but end up with AES-128. 32 (AES-256 key size) makes that impossible (unless they think there's such a thing as AES-512, but then we can't help them).

We shouldn't have key ids just concatenated with the keys if we're also going to infer key length from file size. This seems too confusing and error prone. (and 32 byte ids are kind of ridiculous) I think we should include the desired cipher and key size in the command line flag that gives the key file path.


Comments from Reviewable

@mberhault
Copy link
Copy Markdown
Contributor Author

Review status: 10 of 27 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


c-deps/libroach/ccl/key_manager.cc, line 61 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We shouldn't have key ids just concatenated with the keys if we're also going to infer key length from file size. This seems too confusing and error prone. (and 32 byte ids are kind of ridiculous) I think we should include the desired cipher and key size in the command line flag that gives the key file path.

I agree they're a bit ridiculous, but we can do that length just for store keys, not much of a hardship. The confusion is the main concern. The RFC explored different ways of specifying the encryption settings. If you'll recall we decided that inferring the desired AES key size from the store key size was easiest. I'm happy going back to key sha to avoid confusion, but this is starting to feel like something we just need to decide and live with. Ultimately, I'm hoping that third-party KMS support will mean the plain-file key manager will be mostly irrelevant (I may regret those words one day).


Comments from Reviewable

@bdarnell
Copy link
Copy Markdown
Contributor

bdarnell commented Jan 4, 2018

Review status: 10 of 27 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


c-deps/libroach/ccl/key_manager.cc, line 61 at r1 (raw file):

Previously, mberhault (marc) wrote…

I agree they're a bit ridiculous, but we can do that length just for store keys, not much of a hardship. The confusion is the main concern. The RFC explored different ways of specifying the encryption settings. If you'll recall we decided that inferring the desired AES key size from the store key size was easiest. I'm happy going back to key sha to avoid confusion, but this is starting to feel like something we just need to decide and live with. Ultimately, I'm hoping that third-party KMS support will mean the plain-file key manager will be mostly irrelevant (I may regret those words one day).

OK, let's leave it for now. Allowing plenty of room for the key id will also allow for non-random IDs if someone wants to manage things that way for some reason.


Comments from Reviewable

@bdarnell
Copy link
Copy Markdown
Contributor

bdarnell commented Jan 4, 2018

:lgtm:


Reviewed 5 of 32 files at r1, 2 of 2 files at r3, 12 of 13 files at r4, 3 of 3 files at r5.
Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


c-deps/libroach/testutils.h, line 52 at r5 (raw file):

// If err_msg is empty, status must be ok. Otherwise, the status message must match
// the 'err_msg' (string compare) or 'err_msg' (regexp full match).

Why do you need both literal match and regex? What if I pass a string that fails to compile as a regex? I'd prefer to make this regex-only, and if you need exact matching make it a separate macro/function.


Comments from Reviewable

@mberhault mberhault force-pushed the marc/add_data_key_manager branch from e508746 to 0ec4422 Compare January 4, 2018 17:48
@mberhault
Copy link
Copy Markdown
Contributor Author

Review status: 24 of 27 files reviewed at latest revision, 5 unresolved discussions.


c-deps/libroach/testutils.h, line 52 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why do you need both literal match and regex? What if I pass a string that fails to compile as a regex? I'd prefer to make this regex-only, and if you need exact matching make it a separate macro/function.

I initially put both to avoid having to escape parentheses (eg: \\( to match a literal (). But since this was only checking the boring part of an error message, I'm removing those instead, leaving just one as example. I've removed the exact string matching, we can add a separate helper method if escaping becomes too tedious.


c-deps/libroach/ccl/key_manager.cc, line 61 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

OK, let's leave it for now. Allowing plenty of room for the key id will also allow for non-random IDs if someone wants to manage things that way for some reason.

Done.


Comments from Reviewable

Release note: None

Part of encryption at rest: see cockroachdb#19783

The data key manager persists its keys in a local file and generates a
new key when the active store key has changed or if the current key is
too old.

Reworked the key storage to distinguish actual keys (SecretKey) from
just the key information (KeyInfo). The KeyInfo should be passed most of
the time, with `CurrentKey` and `GetKey` possibly going away in favor of
a `SetKeyForCipher` method in the future.
@mberhault mberhault force-pushed the marc/add_data_key_manager branch from 0ec4422 to f544e00 Compare January 4, 2018 20:46
@mberhault mberhault merged commit b9097c7 into cockroachdb:master Jan 4, 2018
@mberhault mberhault deleted the marc/add_data_key_manager branch January 4, 2018 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants