-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add example for how to read encrypted parquet files #7283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
14643dd to
78bca86
Compare
|
@alamb this now covers two reading examples. We can either add it now or wait for the writing path and add it as well. |
parquet/src/arrow/mod.rs
Outdated
| //! | ||
| //! # Example of reading uniformly encrypted parquet file into arrow record batch | ||
| //! | ||
| #![cfg_attr(feature = "encryption", doc = "```rust")] | ||
| #![cfg_attr(not(feature = "encryption"), doc = "```ignore")] | ||
| //! # use arrow_array::{Int32Array, ArrayRef}; | ||
| //! # use arrow_array::{types, RecordBatch}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb does this look like a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not that familiar with Rust documentation and doc tests, does cfg_attr mean that the example will only be included in the docs if the encryption feature is enabled, or is that only controlling whether the example is tested? Do we want to always show the example but document that it requires the encryption feature (and that the feature is experimental)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly this does the equivalent of:
#![cfg(feature = "encryption")]
//! ```rust
#![cfg(not(feature = "encryption"))]
//! ```ignore
I noticed that docs generation fails if encryption is not enabled, however I tested now and on CI appears to have encryption enabled when testing docs so I'll remove these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, it seems that sometimes docs are tested without encryption so this would fail. I'm reintroducing the cfg_attr check for now.
However it's not clear to me if the docs that will ultimately be produced will include this example or not.
parquet/src/arrow/mod.rs
Outdated
| //! | ||
| //! # Example of reading uniformly encrypted parquet file into arrow record batch | ||
| //! | ||
| #![cfg_attr(feature = "encryption", doc = "```rust")] | ||
| #![cfg_attr(not(feature = "encryption"), doc = "```ignore")] | ||
| //! # use arrow_array::{Int32Array, ArrayRef}; | ||
| //! # use arrow_array::{types, RecordBatch}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not that familiar with Rust documentation and doc tests, does cfg_attr mean that the example will only be included in the docs if the encryption feature is enabled, or is that only controlling whether the example is tested? Do we want to always show the example but document that it requires the encryption feature (and that the feature is experimental)?
Co-authored-by: Adam Reeve <adreeve@gmail.com>
|
Thanks for the review @adamreeve ! I'veused your suggestions. I'm not sure what to do about the comments, perhaps @alamb has a better suggestion (#7283 (comment)). |
| //! ``` | ||
| //! | ||
| //! # Example of reading non-uniformly encrypted parquet file into arrow record batch | ||
| //! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a note here that this requires the experimental encryption feature to be enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
Co-authored-by: Adam Reeve <adreeve@gmail.com>
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @rok and @adamreeve
I took the liberty of updating pushing a change to this PR as I wanted to get the doc changes in for the next release and so avoid the back and forth of another round of reviews.
parquet/src/arrow/mod.rs
Outdated
| //! ArrowReaderOptions::default().with_file_decryption_properties(decryption_properties); | ||
| //! let reader_metadata = ArrowReaderMetadata::load(&file, options.clone()).unwrap(); | ||
| //! let file_metadata = reader_metadata.metadata().file_metadata(); | ||
| //! println!("File has {} rows.", file_metadata.num_rows()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a more common pattern in rust crates is to use assert_eq! rather than println! -- it then contains the actual expected output in the example and is verified by the docs test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the liberty of doing this in 97ecf40
|
MSRV check is unrelated, see: |
Which issue does this PR close?
Rationale for this change
This is to provide users a starting point and a reference for reading encrypted files.
What changes are included in this PR?
We add couple of examples to the parquet docs.
Are there any user-facing changes?
Only in docs.