PARQUET-1286: Crypto package#614
Conversation
| */ | ||
|
|
||
| package org.apache.parquet.crypto; | ||
|
|
| import java.io.InputStream; | ||
| import java.security.GeneralSecurityException; | ||
| import java.util.Arrays; | ||
|
|
| * @throws IOException | ||
| */ | ||
| public AesDecryptor(Mode mode, byte[] keyBytes) throws IllegalArgumentException, IOException { | ||
| if (null == keyBytes) { |
There was a problem hiding this comment.
Should empty array also be considered as an illegal argument?
There was a problem hiding this comment.
array will always have some contents. there is no procedure to check for weak keys in AES.
| throw new IllegalArgumentException("Null key bytes"); | ||
| } | ||
| this.aesMode = mode; | ||
| aesKey = new SecretKeySpec(keyBytes, "AES"); |
There was a problem hiding this comment.
Why not use this.aesKey since you use this.aesMode in above line?
There was a problem hiding this comment.
removed this. from aesMode
| * @param cipherTextOffset nonce position | ||
| * @param cipherTextLength with nonce (and tag in case of GCM) | ||
| * @param AAD | ||
| * @return |
There was a problem hiding this comment.
removed the javadoc comments. this is not a public api.
| if (Mode.GCM == mode) { | ||
| tagLength = GCM_TAG_LENGTH; | ||
| try { | ||
| aesCipher = Cipher.getInstance("AES/GCM/NoPadding"); |
There was a problem hiding this comment.
Should we make "AES/GCM/NoPadding" as a constant string since we use it more than once?
There was a problem hiding this comment.
added to Mode enum
| throw new IOException("Failed to encrypt", e); | ||
| } | ||
| // Add ciphertext length | ||
| if (writeLength) System.arraycopy(BytesUtils.intToBytes(cipherTextLength), 0, cipherText, 0, lengthBufferLength); |
There was a problem hiding this comment.
If 'writeLenght' is false, should 'ciphertextLength' in line 164 of AesDecryptor.java still be gotten from the cipherText?
There was a problem hiding this comment.
no. the decrypt(InputStream) is called only for encrypted Thrift structures. For them, 'writeLength' is always true.
| int length = pageAAD.length; | ||
| System.arraycopy(pageOrdinalBytes, 0, pageAAD, length-2, 2); | ||
| } | ||
|
|
| * @param keyBytes Key length must be either 16, 24 or 32 bytes. | ||
| */ | ||
| public Builder withKey(byte[] keyBytes) { | ||
| if (null == keyBytes) { |
There was a problem hiding this comment.
Is there a case that keyBytes is null but we should continue?
There was a problem hiding this comment.
good catch, we shouldn't. changed.
| if ((null == footerKey) && checkPlaintextFooterIntegrity && (null == keyRetriever)) { | ||
| throw new IllegalArgumentException("Can't check footer integrity with null footer key and null key retriever"); | ||
| } | ||
|
|
There was a problem hiding this comment.
remove one line of space
| throw new IOException("Failed to create CTR cipher", e); | ||
| } | ||
| ctrIV = new byte[CTR_IV_LENGTH]; | ||
| Arrays.fill(ctrIV, (byte) 0); |
There was a problem hiding this comment.
The explicit casting should be removed here.
There was a problem hiding this comment.
Also, I think you don't have to initialize the array according to https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html. This should improve some performance, so please check other places as well.
There was a problem hiding this comment.
cast is required, it doesnt compile otherwise (takes 0 for int), so we need this in other lines. but indeed this line can be removed, we can count on JVM to fill it with zeroes.
| } | ||
|
|
||
| @Override | ||
| public byte[] decryptInputStream(InputStream from, byte[] AAD) throws IOException { |
There was a problem hiding this comment.
Documents for override functions may exist in the parent class or interface.
@ggershinsky why not read all input at once and then parse them? In that way, you should avoid one allocation.
| int ciphertextLength = | ||
| ((lengthBuffer[3] & 0xff) << 24) | | ||
| ((lengthBuffer[2] & 0xff) << 16) | | ||
| ((lengthBuffer[1] & 0xff) << 8) | |
|
|
||
| public static final int NONCE_LENGTH = 12; | ||
| public static final int GCM_TAG_LENGTH = 16; | ||
| public static final int SIZE_LENGTH = 4; |
There was a problem hiding this comment.
I would rename this to SIZE_LENGTH_IN_BYTES or LENGTH_IN_BYTES.
There was a problem hiding this comment.
by default, all sizes are in bytes. bit sizes are explicitly named with _BITS.
| throw new IOException("Failed to create CTR cipher", e); | ||
| } | ||
| ctrIV = new byte[CTR_IV_LENGTH]; | ||
| Arrays.fill(ctrIV, (byte) 0); |
There was a problem hiding this comment.
You can remove this.
| if (nonce.length != NONCE_LENGTH) throw new IOException("Wrong nonce length " + nonce.length); | ||
| int plainTextLength = plainText.length; | ||
| int cipherTextLength = NONCE_LENGTH + plainTextLength + tagLength; | ||
| int lengthBufferLength = (writeLength? SIZE_LENGTH: 0); |
There was a problem hiding this comment.
remove parentheses?
| aesCipher.init(Cipher.ENCRYPT_MODE, aesKey, spec); | ||
| if (null != AAD) aesCipher.updateAAD(AAD); | ||
| } | ||
| else { |
There was a problem hiding this comment.
nit: I think the code style is to move else { to the above line, please also check others.
|
@nandorKollar thanks for merging the 613. |
|
Okay, thanks Gidon! |
|
almost finished, will push the last commit in this round tomorrow. |
|
@nandorKollar the pr is updated, comments are addressed. |
nandorKollar
left a comment
There was a problem hiding this comment.
@ggershinsky I had a look at the implementation and left some comments. Please apologize if I send some followup review comments, the feature is pretty big, I couldn't give a detailed full review in one round.
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesDecryptor.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesDecryptor.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesDecryptor.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesEncryptor.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/ColumnDecryptionProperties.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/ColumnDecryptionProperties.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/ColumnDecryptionProperties.java
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/FileDecryptionProperties.java
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/FileDecryptionProperties.java
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/crypto/ParquetCipher.java
Outdated
Show resolved
Hide resolved
|
@nandorKollar , I'm back online (after another trip..), will address your comments. |
|
@nandorKollar A commit with changes addressing this review round, is in. |
|
@ggershinsky thanks, I'll have a look at it, though I can't promise that I'll be quick, because I'm a bit busy with an internal project nowadays. Meanwhile, if someone else from the community could have a look at the changes, that would be great! |
|
|
||
|
|
||
| AesCipher(AesMode mode, byte[] keyBytes) throws IllegalArgumentException, IOException { | ||
| if (null == keyBytes) { |
There was a problem hiding this comment.
Is empty keyBytes allowed?
There was a problem hiding this comment.
you probably mean all-zero keyBytes. While this is allowed, it might be problematic in certain rare situations. To be on the safe side, I've added throwing an exception for such keys.
| public static void quickUpdatePageAAD(byte[] pageAAD, short newPageOrdinal) { | ||
| byte[] pageOrdinalBytes = shortToBytesLE(newPageOrdinal); | ||
| int length = pageAAD.length; | ||
| System.arraycopy(pageOrdinalBytes, 0, pageAAD, length-2, 2); |
| System.arraycopy(pageOrdinalBytes, 0, pageAAD, length-2, 2); | ||
| } | ||
|
|
||
| static byte[] concatByteArrays(byte[]... arrays) { |
There was a problem hiding this comment.
its called from other classes in this package
| byte[] output = new byte[totalLength]; | ||
| int offset = 0; | ||
| for (byte[] array : arrays) { | ||
| int arrayLength = array.length; |
There was a problem hiding this comment.
This local variable seems unnecessary.
| private static byte[] shortToBytesLE(short input) { | ||
| byte[] output = new byte[2]; | ||
| output[1] = (byte)(0xff & (input >> 8)); | ||
| output[0] = (byte)(0xff & (input)); |
There was a problem hiding this comment.
Remove "()", (input) ==> input
| try { | ||
| aesKey.destroy(); | ||
| } catch (DestroyFailedException e) { | ||
| throw new ShouldNeverHappenException(e); |
There was a problem hiding this comment.
In the case of Exception, do we consider wipeOut as true or false?
There was a problem hiding this comment.
good point, should be false. I'll move the flag change to the end.
| return metaDataDecryptor; | ||
| } | ||
|
|
||
| boolean isEncryptedWithFooterKey() { |
There was a problem hiding this comment.
add keyword 'public/private'
There was a problem hiding this comment.
the scope of this method is "package"
| return isEncryptedWithFooterKey; | ||
| } | ||
|
|
||
| ColumnPath getPath() { |
There was a problem hiding this comment.
add keyword 'public/private'
There was a problem hiding this comment.
the scope of this method is "package"
| return columnOrdinal; | ||
| } | ||
|
|
||
| byte[] getKeyMetadata() { |
There was a problem hiding this comment.
add keyword 'public/private'
There was a problem hiding this comment.
the scope of this method is "package"
| */ | ||
|
|
||
| package org.apache.parquet.crypto; | ||
|
|
|
|
||
| public static final int SIZE_LENGTH = 4; | ||
|
|
||
|
|
52b203f to
ca95b11
Compare
|
Round 3 of the review is processed. Feel free to merge this pr in the feature branch, or review again if needed. |
gszadovszky
left a comment
There was a problem hiding this comment.
Agreed with Nandor that I continue his review.
There are a couple of formatting issues (multiple blank lines, if blocks without linebreaks and braces, missing spaces etc.). Please, use a properly configured formatter for your code and follow our guidelines.
From How To Contribute:
Use 2 spaces for whitespace. Not tabs, not 4 spaces. The number of the spacing shall be 2.
Give your operators some room. Not a+b but a + b and not foo(int a,int b) but foo(int a, int b).
Generally speaking, stick to the Sun Java Code Conventions
Make sure tests pass!
@shangxinli, @chenjunjiedada, do you have any additional notes?
parquet-hadoop/src/main/java/org/apache/parquet/crypto/FileDecryptionProperties.java
Show resolved
Hide resolved
|
@gszadovszky, I have reviewed and @ggershinsky has addressed all my comments. I am OK to proceed. |
|
@gszadovszky thanks, I'm on a road this week, will handle the changes early next week. |
|
Failing because of a flakiness in the test data generation. See PARQUET-1794 for details. |
* PARQUET-1228: Format Structures encryption (#613) * PARQUET-1286: Crypto package (#614) * PARQUET-1818: Fix bloom/encryption collision in format-structures (#771) * PARQUET-1817: Crypto Properties Factory (#769) * PARQUET-1229: Parquet MR encryption (#776) * PARQUET-1807: Encryption: Interop and Function test suite for Java version (#782) * PARQUET-1373: Encryption key tools (#615) Co-authored-by: shangxinli <31421745+shangxinli@users.noreply.github.com> Co-authored-by: Maya Anderson <mayaa@il.ibm.com>
No description provided.