Add the key management tools API for encryption#426
Add the key management tools API for encryption#426adamreeve merged 25 commits intoG-Research:masterfrom
Conversation
| /// This CryptoFactory instance must remain alive and not disposed until after any files using these | ||
| /// decryption properties have been read, as internally the FileDecryptionProperties contains references to | ||
| /// data in the CryptoFactory that cannot be managed by ParquetSharp. | ||
| /// Failure to do so may result in native memory access violations and crashes that cannot be caught as exceptions. |
There was a problem hiding this comment.
This is a bit awkward. I couldn't find a way to handle this nicely without exposing users to possible segfaults, as the FileDecryptionProperties class in C++ has a private constructor and can't be subclassed, so we can't do anything like combine it with a reference to the CryptoFactory. I'm hoping this can be fixed in Arrow so we can fix this limitation for the next version though.
There was a problem hiding this comment.
I've made a PR in Arrow to address this at apache/arrow#40329
There was a problem hiding this comment.
That's been merged, so we should be able to remove this warning in the 16.0.0 release
62b367d to
7101437
Compare
|
I'm not sure what to do about the Windows build failure. It worked for me with MSVC 19.35.32216.1, and with 19.39.33521.0 after upgrading. The GitHub runner has version 19.38.33135.0 though. It looks like this is triggered by my changes pulling in a dependency on the Arrow FileSystem API, but this seems to be a bug in MSVC. I found a similar issue reported at https://developercommunity.visualstudio.com/t/Latest-Visual-Studio-version-1791-ca/10598722 We might need to wait until the runners get a newer version of MSVC, or disable this feature on Windows for now. |
|
Similar build failure issue reported at actions/runner-images#9398, but that was apparently caused by linking to a different STL version than what the compiler expected. Maybe there's something similar going on with our build. |
|
The build failure appears to be because Arrow is built with a different MSVC than ParquetSharp, which isn't good. I'm not sure why different versions are being selected: From the Arrow build logs: And from the ParquetSharp build: |
fff7f25 to
8f6c8dd
Compare
8f6c8dd to
b57fc62
Compare
b57fc62 to
634b69a
Compare
marcin-krystianc
left a comment
There was a problem hiding this comment.
Amazing work, well done!
| set => ExceptionInfo.Check(EncryptionConfiguration_SetDataKeyLengthBits(Handle.IntPtr, value)); | ||
| } | ||
|
|
||
| private static string EncodeColumnKeys(IReadOnlyDictionary<string, IReadOnlyList<string>> columnKeys) |
There was a problem hiding this comment.
nit: It can be implemented in a single line:
return string.Join(";", columnKeys.Select(kvp => $"{kvp.Key}:{string.Join(",", kvp.Value)}"));
Nonetheless, the current version is also fine.
There was a problem hiding this comment.
Good point, that's much more readable, I've changed this now
Fixes #200
This adds bindings to the Parquet key management tools API, also known as the high level encryption API. The existing low-level encryption API supports setting encryption/decryption keys directly, whereas this API uses a Key Management System (KMS) to implement envelope encryption, where data encryption keys are randomly generated and then stored encrypted in the file metadata.