Skip to content

PARQUET-1286: Crypto package#614

Merged
gszadovszky merged 14 commits intoapache:encryptionfrom
ggershinsky:p1286-crypto-package
Feb 12, 2020
Merged

PARQUET-1286: Crypto package#614
gszadovszky merged 14 commits intoapache:encryptionfrom
ggershinsky:p1286-crypto-package

Conversation

@ggershinsky
Copy link
Copy Markdown
Contributor

No description provided.

*/

package org.apache.parquet.crypto;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove one space line

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import java.io.InputStream;
import java.security.GeneralSecurityException;
import java.util.Arrays;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove one space line

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @throws IOException
*/
public AesDecryptor(Mode mode, byte[] keyBytes) throws IllegalArgumentException, IOException {
if (null == keyBytes) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should empty array also be considered as an illegal argument?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array will always have some contents. there is no procedure to check for weak keys in AES.

throw new IllegalArgumentException("Null key bytes");
}
this.aesMode = mode;
aesKey = new SecretKeySpec(keyBytes, "AES");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use this.aesKey since you use this.aesMode in above line?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this. from aesMode

* @param cipherTextOffset nonce position
* @param cipherTextLength with nonce (and tag in case of GCM)
* @param AAD
* @return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decrypted text

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the javadoc comments. this is not a public api.

if (Mode.GCM == mode) {
tagLength = GCM_TAG_LENGTH;
try {
aesCipher = Cipher.getInstance("AES/GCM/NoPadding");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make "AES/GCM/NoPadding" as a constant string since we use it more than once?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be enum?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to Mode enum

throw new IOException("Failed to encrypt", e);
}
// Add ciphertext length
if (writeLength) System.arraycopy(BytesUtils.intToBytes(cipherTextLength), 0, cipherText, 0, lengthBufferLength);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 'writeLenght' is false, should 'ciphertextLength' in line 164 of AesDecryptor.java still be gotten from the cipherText?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. the decrypt(InputStream) is called only for encrypted Thrift structures. For them, 'writeLength' is always true.

int length = pageAAD.length;
System.arraycopy(pageOrdinalBytes, 0, pageAAD, length-2, 2);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove one space line

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @param keyBytes Key length must be either 16, 24 or 32 bytes.
*/
public Builder withKey(byte[] keyBytes) {
if (null == keyBytes) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case that keyBytes is null but we should continue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, we shouldn't. changed.

if ((null == footerKey) && checkPlaintextFooterIntegrity && (null == keyRetriever)) {
throw new IllegalArgumentException("Can't check footer integrity with null footer key and null key retriever");
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove one line of space

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

throw new IOException("Failed to create CTR cipher", e);
}
ctrIV = new byte[CTR_IV_LENGTH];
Arrays.fill(ctrIV, (byte) 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit casting should be removed here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think you don't have to initialize the array according to https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html. This should improve some performance, so please check other places as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast is required, it doesnt compile otherwise (takes 0 for int), so we need this in other lines. but indeed this line can be removed, we can count on JVM to fill it with zeroes.

}

@Override
public byte[] decryptInputStream(InputStream from, byte[] AAD) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documents for override functions may exist in the parent class or interface.

@ggershinsky why not read all input at once and then parse them? In that way, you should avoid one allocation.

int ciphertextLength =
((lengthBuffer[3] & 0xff) << 24) |
((lengthBuffer[2] & 0xff) << 16) |
((lengthBuffer[1] & 0xff) << 8) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatted


public static final int NONCE_LENGTH = 12;
public static final int GCM_TAG_LENGTH = 16;
public static final int SIZE_LENGTH = 4;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to SIZE_LENGTH_IN_BYTES or LENGTH_IN_BYTES.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default, all sizes are in bytes. bit sizes are explicitly named with _BITS.

throw new IOException("Failed to create CTR cipher", e);
}
ctrIV = new byte[CTR_IV_LENGTH];
Arrays.fill(ctrIV, (byte) 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this.

if (nonce.length != NONCE_LENGTH) throw new IOException("Wrong nonce length " + nonce.length);
int plainTextLength = plainText.length;
int cipherTextLength = NONCE_LENGTH + plainTextLength + tagLength;
int lengthBufferLength = (writeLength? SIZE_LENGTH: 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove parentheses?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

aesCipher.init(Cipher.ENCRYPT_MODE, aesKey, spec);
if (null != AAD) aesCipher.updateAAD(AAD);
}
else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the code style is to move else { to the above line, please also check others.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ggershinsky
Copy link
Copy Markdown
Contributor Author

@nandorKollar thanks for merging the 613.
This one is outdated. Once back in office in early September, I'll update this PR, and handle the comments from the previous review round.

@nandorKollar
Copy link
Copy Markdown
Contributor

Okay, thanks Gidon!

@ggershinsky
Copy link
Copy Markdown
Contributor Author

almost finished, will push the last commit in this round tomorrow.

@ggershinsky
Copy link
Copy Markdown
Contributor Author

@nandorKollar the pr is updated, comments are addressed.

Copy link
Copy Markdown
Contributor

@nandorKollar nandorKollar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggershinsky I had a look at the implementation and left some comments. Please apologize if I send some followup review comments, the feature is pretty big, I couldn't give a detailed full review in one round.

@ggershinsky
Copy link
Copy Markdown
Contributor Author

@nandorKollar , I'm back online (after another trip..), will address your comments.

@ggershinsky
Copy link
Copy Markdown
Contributor Author

@nandorKollar A commit with changes addressing this review round, is in.

@nandorKollar
Copy link
Copy Markdown
Contributor

@ggershinsky thanks, I'll have a look at it, though I can't promise that I'll be quick, because I'm a bit busy with an internal project nowadays. Meanwhile, if someone else from the community could have a look at the changes, that would be great!



AesCipher(AesMode mode, byte[] keyBytes) throws IllegalArgumentException, IOException {
if (null == keyBytes) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is empty keyBytes allowed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably mean all-zero keyBytes. While this is allowed, it might be problematic in certain rare situations. To be on the safe side, I've added throwing an exception for such keys.

public static void quickUpdatePageAAD(byte[] pageAAD, short newPageOrdinal) {
byte[] pageOrdinalBytes = shortToBytesLE(newPageOrdinal);
int length = pageAAD.length;
System.arraycopy(pageOrdinalBytes, 0, pageAAD, length-2, 2);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space 'length - 2'

System.arraycopy(pageOrdinalBytes, 0, pageAAD, length-2, 2);
}

static byte[] concatByteArrays(byte[]... arrays) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add keyword 'private'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its called from other classes in this package

byte[] output = new byte[totalLength];
int offset = 0;
for (byte[] array : arrays) {
int arrayLength = array.length;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This local variable seems unnecessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

private static byte[] shortToBytesLE(short input) {
byte[] output = new byte[2];
output[1] = (byte)(0xff & (input >> 8));
output[0] = (byte)(0xff & (input));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "()", (input) ==> input

try {
aesKey.destroy();
} catch (DestroyFailedException e) {
throw new ShouldNeverHappenException(e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of Exception, do we consider wipeOut as true or false?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, should be false. I'll move the flag change to the end.

return metaDataDecryptor;
}

boolean isEncryptedWithFooterKey() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add keyword 'public/private'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scope of this method is "package"

return isEncryptedWithFooterKey;
}

ColumnPath getPath() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add keyword 'public/private'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scope of this method is "package"

return columnOrdinal;
}

byte[] getKeyMetadata() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add keyword 'public/private'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scope of this method is "package"

*/

package org.apache.parquet.crypto;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove line spaces


public static final int SIZE_LENGTH = 4;


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove line space

@ggershinsky ggershinsky force-pushed the p1286-crypto-package branch from 52b203f to ca95b11 Compare January 6, 2020 11:48
@ggershinsky
Copy link
Copy Markdown
Contributor Author

Round 3 of the review is processed. Feel free to merge this pr in the feature branch, or review again if needed.

Copy link
Copy Markdown
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Nandor that I continue his review.
There are a couple of formatting issues (multiple blank lines, if blocks without linebreaks and braces, missing spaces etc.). Please, use a properly configured formatter for your code and follow our guidelines.
From How To Contribute:

Use 2 spaces for whitespace. Not tabs, not 4 spaces. The number of the spacing shall be 2.
Give your operators some room. Not a+b but a + b and not foo(int a,int b) but foo(int a, int b).
Generally speaking, stick to the Sun Java Code Conventions
Make sure tests pass!

@shangxinli, @chenjunjiedada, do you have any additional notes?

@shangxinli
Copy link
Copy Markdown
Contributor

shangxinli commented Feb 3, 2020

@gszadovszky, I have reviewed and @ggershinsky has addressed all my comments. I am OK to proceed.

@ggershinsky
Copy link
Copy Markdown
Contributor Author

@gszadovszky thanks, I'm on a road this week, will handle the changes early next week.

@ggershinsky ggershinsky reopened this Feb 12, 2020
@gszadovszky
Copy link
Copy Markdown
Contributor

Failing because of a flakiness in the test data generation. See PARQUET-1794 for details.
As this PR is not for master and the failing test is not related, I'm merging this change.

@gszadovszky gszadovszky merged commit 1ce6bcc into apache:encryption Feb 12, 2020
gszadovszky pushed a commit that referenced this pull request Feb 27, 2020
shangxinli pushed a commit to shangxinli/parquet-mr that referenced this pull request Mar 1, 2020
gszadovszky pushed a commit that referenced this pull request Apr 22, 2020
shangxinli pushed a commit to shangxinli/parquet-mr that referenced this pull request Jul 15, 2020
gszadovszky pushed a commit that referenced this pull request Jul 28, 2020
gszadovszky pushed a commit that referenced this pull request Jul 29, 2020
* PARQUET-1228: Format Structures encryption (#613)
* PARQUET-1286: Crypto package (#614)
* PARQUET-1818: Fix bloom/encryption collision in format-structures (#771)
* PARQUET-1817: Crypto Properties Factory (#769)
* PARQUET-1229: Parquet MR encryption (#776)
* PARQUET-1807: Encryption: Interop and Function test suite for Java version (#782)
* PARQUET-1373: Encryption key tools (#615)

Co-authored-by: shangxinli <31421745+shangxinli@users.noreply.github.com>
Co-authored-by: Maya Anderson <mayaa@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants