PARQUET-1817: Crypto Properties Factory#769
Conversation
0760281 to
7185aba
Compare
|
Travis fails because the branch is broken (on parquet-format-structures - unhandled mix of encryption and bloom filters). A known problem, I'll send a fix for that today. |
|
the branch is fixed by #771 . Once it is merged, your pr should pass CI checks after rebase. |
|
#771 is merged, Travis restarted. |
|
"[WARNING] Files with unapproved licenses: |
|
Fixed it. Thanks Gidon and Gabor! |
|
@shangxinli @gszadovszky my pull request is ready, but now it depends on this one. I'll send it after 769 is merged. |
The build is green now. Once @gszadovszky review the code, he or myself can merge it and you are ready to go. |
gszadovszky
left a comment
There was a problem hiding this comment.
I assume we will see some implementations of this class in the unit tests of the crypto feature. Just double check...
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
Yes. |
| * the unit test (TBD) for example | ||
| * | ||
| * @param fileHadoopConfig Configuration that is used to pass the needed information, e.g. KMS uri | ||
| * @param tempFilePath File path of the parquet file |
There was a problem hiding this comment.
@ggershinsky, can you make comments about what to be used by this field? Our implementation doesn't need it so far.
There was a problem hiding this comment.
I'd suggest also to add a comment to the interface itself, to remind about its purpose (this helps to understand the method parameters), something like "CryptoPropertiesFactory interface enables transparent activation of Parquet encryption. It's custom implementations produce encryption and decryption properties for each Parquet file, using the information available in Parquet file writers and readers: file path, file extended schema (in writers only) - and also Hadoop configuration properties that can pass custom parameters required by a crypto factory. A factory implementation can use or ignore any of these parameters."
As for the "tempFilePath" - "File path of the parquet file being written. Can be used for AAD prefix creation, key material management, etc. Implementations must not presume the path is permanent, as the file can be moved or renamed later"
|
|
||
| import java.io.IOException; | ||
|
|
||
| import static org.apache.parquet.Preconditions.checkNotNull; |
There was a problem hiding this comment.
this import is not needed
|
strange. All Travis checks are green, |
|
@ggershinsky , FYI, Iceberg has a PR that has the same status. |
gszadovszky
left a comment
There was a problem hiding this comment.
Last time I've forgot about one major thing. (Sorry for the late idea.)
parquet-mr library is usually used for either reading or writing at a time but not both. I think, it is not a good practice to expect both encryption and decryption properties while only one would be required.
What do you guys think?
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
On one hand, its certainly possible to split this into two interfaces, to make it fit the parquet-mr practices. On the other hand, a person working with encryption in eg Spark shell might want to both write and read the files. He would define two Hadoop config params for that, "parquet.encryption.factory.class" and "parquet.decryption.factory.class", which is a bit cumbersome. Or maybe one parameter is enough - if the class implements both interfaces, and we'll load it as either encryption or decryption as needed (from writer or reader, respectively). If this works in Java (currently can't see why not), then could be the right thing to do. |
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/CryptoPropertiesFactory.java
Outdated
Show resolved
Hide resolved
…ropertiesFactory.java Co-Authored-By: Gabor Szadovszky <gabor@apache.org>
…ropertiesFactory.java Co-Authored-By: Gabor Szadovszky <gabor@apache.org>
…ropertiesFactory.java Co-Authored-By: Gabor Szadovszky <gabor@apache.org>
I like this idea. It should work just fine. |
I agree with Gidon. Aside from that, some ETL transform jobs need to decrypt raw data Parquet files and encrypt transformed Parquet files. |
…ropertiesFactory.java Co-Authored-By: ggershinsky <ggershinsky@users.noreply.github.com>
…ropertiesFactory.java Co-Authored-By: ggershinsky <ggershinsky@users.noreply.github.com>
|
@gszadovszky, The latest commit needs to be changed and I think it would be better to add unit tests now than TBD. You can hold off for now and I will let you know when it is ready for review. |
…ce and add unit tests
a7f02a2 to
d72b2a1
Compare
|
@gszadovszky, I think the change is ready to review now. |
parquet-hadoop/src/main/java/org/apache/parquet/crypto/DecryptionPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/EncryptionPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/ParquetCryptoRuntimeException.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/test/java/org/apache/parquet/crypto/SampleDecryptionPropertiesFactory.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/test/java/org/apache/parquet/crypto/SampleEncryptionPropertiesFactory.java
Outdated
Show resolved
Hide resolved
…ionPropertiesFactory.java Co-Authored-By: Gabor Szadovszky <gabor@apache.org>
…CryptoRuntimeException.java Co-Authored-By: Gabor Szadovszky <gabor@apache.org>
…ionPropertiesFactory.java Co-Authored-By: Gabor Szadovszky <gabor@apache.org>
bb307db to
8bb18eb
Compare
* PARQUET-1228: Format Structures encryption (#613) * PARQUET-1286: Crypto package (#614) * PARQUET-1818: Fix bloom/encryption collision in format-structures (#771) * PARQUET-1817: Crypto Properties Factory (#769) * PARQUET-1229: Parquet MR encryption (#776) * PARQUET-1807: Encryption: Interop and Function test suite for Java version (#782) * PARQUET-1373: Encryption key tools (#615) Co-authored-by: shangxinli <31421745+shangxinli@users.noreply.github.com> Co-authored-by: Maya Anderson <mayaa@il.ibm.com>
Make sure you have checked all steps below.
Jira
Tests
Commits
Documentation