Skip to content

feature-1659: Initial draft#1680

Merged
lvca merged 2 commits into
ArcadeData:mainfrom
pawellhasa:feature-1659/support-for-encryption-in-serializer
Sep 5, 2024
Merged

feature-1659: Initial draft#1680
lvca merged 2 commits into
ArcadeData:mainfrom
pawellhasa:feature-1659/support-for-encryption-in-serializer

Conversation

@pawellhasa

Copy link
Copy Markdown
Contributor

What does this PR do?

Initial draft for encryption implementation at serialisation level

Motivation

Provide REST encryption

Related issues

#1659

Additional Notes

This is draft, so I'm looking for suggestions

Checklist

  • I have run the build using mvn clean package command
  • [-] My unit tests cover both failure and success scenarios <- I didn't cover all data types yet

@lvca lvca added the enhancement New feature or request label Aug 1, 2024
@lvca lvca self-requested a review August 1, 2024 16:48
decipher.init(Cipher.DECRYPT_MODE, secretKey, new GCMParameterSpec(tagSize, ivBytes));
return decipher.doFinal(encryptedData);
} catch (Exception e) {
throw new RuntimeException("Error while decrypting data", e);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about creating a new exception class for encryption, like EncryptionException that extends ArcadeDBException (so it's a RuntimeException)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do like that

public void serializeValue(final Database database, final Binary serialized, final byte type, Object value) {
if (value == null)
return;
Binary content = dataEncryption != null ? new Binary() : serialized;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to create a new Binary() object in the case of encryption content?

case BinaryTypes.TYPE_RID:
break;
default:
serialized.putBytes(dataEncryption.encrypt(content.toByteArray()));

@lvca lvca Aug 1, 2024

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, you're encrypting every single value. Why don't encrypting the whole Binary at the end, once all the values are serialized? This would be much faster and you don't need to differentiate special cases like RIDs. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll have a look

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking inside the code to find best place to do it, as Binary is referred in plenty of places, and it seemed that interactions with BaseRecord#buffer is what I am looking for. That however also is fairly nested inside many classes meaning it is difficult to intercept right moment to perform write/read as it gets passed around and often buffer is accessed directly to perform partial read.

In the end, I chosen implementing encryption at BinarySerializer in serializeProperties(), deserializeProperties(), deserializeProperty and hasProperty(). Idea is to write just all properties as encrypted at once and vice-versa. The issue I ran into is that serializeProperties() writes to two types: header and content. Header stores information about propertiesCount and about each propertyKey with its bufferPosition for the value. This allows reading just the value from whole buffer, and not all properties to find one. However this isn't possible with the goal above. I have to either re-factor the code to read all properties from buffer to decode, regardless of looking for specific one and then filter by key, or leave existing behaviour.

I could also re-factor structure of BaseRecord#binary to distinct between meta content (like ID) and data (encrypted) but that could be even more re-factoring, and it wouldn't resolve cherry picking properties anyway.

I understand that current implementation with encryption of each value adds performance cost, I didn't measure how this would affect our product yet, but it is mandatory requirement for us anyway.

What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this review. It makes sense, it's better to avoid encrypting the metadata for fast accessing to the properties you're requesting. Also, this makes easier to encrypt single property by adding some configuration (in the Schema -> Type -> Property)

@lvca lvca added this to the 24.8.1 milestone Sep 5, 2024
@lvca

lvca commented Sep 5, 2024

Copy link
Copy Markdown
Member

Ok, I'm doing some tests before merging it. I'd like to provide a way to define encryption without using Java so it would work also in a standard client/server configuration.

@gramian WDYT about it?

@lvca lvca merged commit 15edca8 into ArcadeData:main Sep 5, 2024
@lvca

lvca commented Sep 5, 2024

Copy link
Copy Markdown
Member

Thanks, @pawellhasa for your contribution!

@gramian

gramian commented Sep 5, 2024

Copy link
Copy Markdown
Collaborator

I think the feature is useful. I have some questions:

  • Should there be some kind of performance testing to have at least rough idea about the compute cost? I mean something like a simple CRUD test of enough records with and without encryption.
  • I didn't' see any changes to the settings (maybe I overlooked them). How is encryption controlled just by using it with the embedded server?

@lvca

lvca commented Sep 5, 2024

Copy link
Copy Markdown
Member

To set the encryption now you have to call this API before using the database (right after open):

database.setDataEncryption(DefaultDataEncryption.useDefaults(DefaultDataEncryption.getSecretKeyFromPasswordUsingDefaults(password, salt)));

Good idea about a micro benchmark to see what's the cost of writing and reading with the default settings.

Also, if we provide a new boolean property encrypt (default=false) at both property and type levels, we can define which type/property to encrypt:

  • if encrypt is true on the type, then all the properties are encrypted
  • if encrypt is true on a property, then that property is encrypted

@pawellhasa WDYT?

@pawellhasa

Copy link
Copy Markdown
Contributor Author

Selecting which properties to encrypt can help avoiding encryption computing cost. I am away so won't be able to properly think about it till next week. I'll find out impact on our end and whether we would consider encryption for just part of data worth it.

@lvca

lvca commented Sep 9, 2024

Copy link
Copy Markdown
Member

NP. I prefer to complete this part and be done in terms of API (and SQL) before updating the docs = other users start using it.

@pawellhasa

Copy link
Copy Markdown
Contributor Author

Atm we don't want to exclude properties from encrypting in our product, if something changes, we'll discuss it

mergify Bot added a commit that referenced this pull request May 10, 2026
…n /studio [skip ci]

Bumps [terser](https://github.com/terser/terser) from 5.46.2 to 5.47.1.
Changelog

*Sourced from [terser's changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md).*

> v5.47.1
> -------
>
> * Fix crash when using `mangle.keep_fnames` with destructuring
>
> v5.47.0
> -------
>
> * Add `builtins_ecma` and `builtins_pure` options
> * Add Intl options to domprops ([#1680](https://redirect.github.com/terser/terser/issues/1680))


Commits

* [`bf949e7`](terser/terser@bf949e7) 5.47.1
* [`23bb72e`](terser/terser@23bb72e) update changelog
* [`1fd2134`](terser/terser@1fd2134) fix crash when using `mangle.keep_fnames` with destructuring. Closes [#1681](https://redirect.github.com/terser/terser/issues/1681)
* [`7cbd24d`](terser/terser@7cbd24d) 5.47.0
* [`b1bc6bd`](terser/terser@b1bc6bd) update changelog
* [`be36c87`](terser/terser@be36c87) add "builtins" and "builtins\_pure" options ([#1651](https://redirect.github.com/terser/terser/issues/1651))
* [`2d52ff3`](terser/terser@2d52ff3) add Intl option keys (DurationFormat, DateTimeFormat, RelativeTimeFormat) to ...
* See full diff in [compare view](terser/terser@v5.46.2...v5.47.1)
  
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility\_score?dependency-name=terser&package-manager=npm\_and\_yarn&previous-version=5.46.2&new-version=5.47.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
Dependabot commands and options
  
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot show  ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants