-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Refactor ipc reading code into methods on ArrayReader
#7006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
arrow-ipc/src/reader.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This illustrates the point of this PR: remove the need to pass require_alignment as an argument.
It does this by this code into a method on ArrayReader which means require_alignment is now passed as a field.
require_alignment is not a problem, but skip_validation is, for the reasons @tustvold states on #6938 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should rename ArrayReader to RecordBatchDecoder (there is a trait already called RecordBatchReader), and this isn't actually doing anything to do with io::Read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea -- I will do so in a follow on PR
arrow-ipc/src/reader.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aliased self --> reader to minimize the diff. I can remove this rename as a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably also be good to move it to the rest of the impl at the same time
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a few other methods that need to be moved into ArrayReader (e.g. anything that has a require_alignment flag
arrow-ipc/src/reader.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is refactored code from read_record_batch_impl
arrow-ipc/src/reader.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code in this method was split across ArrayReader::try_new and ArrayReader::read_record_batch
56e9e07 to
81d0146
Compare
tustvold
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, I did have a thought that this might make it harder to "safely" encapsulate the unsafe unchecked flag, but I think we could do something like
mod private {
/// A boolean flag that cannot be mutated outside of unsafe code
pub struct UnsafeFlag(bool);
impl UnsafeFlag {
#[inline]
pub unsafe fn set(&mut self, val: bool) {
self.0 = val;
}
#[inline]
pub fn get(&self) -> bool {
return self.0
}
}
}
arrow-ipc/src/reader.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should rename ArrayReader to RecordBatchDecoder (there is a trait already called RecordBatchReader), and this isn't actually doing anything to do with io::Read
arrow-ipc/src/reader.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably also be good to move it to the rest of the impl at the same time
|
|
||
| let options = RecordBatchOptions::new().with_row_count(Some(self.batch.length() as usize)); | ||
|
|
||
| let schema = Arc::clone(&self.schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this will now always clone the schema even if it is projected, this is almost certainly irrelevant but something I noticed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point -- this Arc clone comes from the fact that self.create_array below takes &mut self so the code can't also have self.schema borrowed immutably as well
As it happens once per RecordBatch I agree with your assesment that this will be irrelevant in practice
This is a great idea; I tried it out in ArrayDataBuilder and I think it looks quite good 👌 |
|
Thanks again @tustvold |
Note this PR includes a bunch of whitespace changes that make it look like a much larger change than it is
Viewing the diff by ignoring whitespace makes the structure more clear I think
Which issue does this PR close?
Rationale for this change
@totoroyyb and I are working on adding another flag to the ipc reader code that allows disabling validation during read (see #6938). If the flag is true it will be
unsafe.The current structure of the code as a bunch of free functions makes it impossible to pass the parameter down without having to mark all the inner functions unsafe, which is not correct.
What changes are included in this PR?
This PR refactors the code that currently takes an
ArrayReaderas a parameterand makes it a method on the function. This has the nice property that
we don't have to pass
require_alignmentas a flag anymore and it sets us up very nicelyto be able to add
disable_validationin a follow on PRAre there any user-facing changes?
There are no intended changes in this PR -- all changes are to internal structures only.