-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add ScalarValue::Struct variant #1091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
datafusion/src/scalar.rs
Outdated
| IntervalYearMonth(Option<i32>), | ||
| /// Interval with DayTime unit | ||
| IntervalDayTime(Option<i64>), | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick. Delete the blank line to keep the same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in e25e500
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jonmmease -- this looks great and I am excited to extend DataFusion's type system support to include better handling of structs. @Igosuki and @houqp have been working to add the ability to select fields in https://github.com/apache/arrow-datafusion/pull/1006/files and it seems to me that this code sets the foundation for field access to struct types as well.
There appears to be a small rustfmt issue preventing CI from running on this PR: https://github.com/apache/arrow-datafusion/pull/1091/checks?check_run_id=3850128421
(I think it can be resolved by running cargo fmt and checking in the result)
Otherwise, if you could just add a test for a nested struct I think this PR is ready to go
datafusion/src/scalar.rs
Outdated
| let mut fields: Vec<Field> = Vec::new(); | ||
| let mut scalars: Vec<ScalarValue> = Vec::new(); | ||
|
|
||
| value.iter().for_each(|(name, scalar)| { | ||
| fields.push(Field::new(name, scalar.get_datatype(), false)); | ||
| scalars.push(scalar.clone()); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to write this in a more functional style (and avoid mut) using unzip: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.unzip
Something like (untested)
let (fields, scalars) = value.iter()
.for_each(|(name, scalar)| {
(Field::new(name, scalar.get_datatype(), false),
scalar.clone())
})
.unzip();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be clear I don't think this change is needed for this PR, I just wanted to point it out -- it took me quite a while to get comfortable with the functional parts of rust -- at first I thought it was just style preference, but then the longer I work in rust I realize it is one of the key mechanisms rust uses to eliminate bounds checks during iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I hadn't realized Rust had an unzip. Thanks for the suggestion. Done in 459c029
| )), | ||
| None | ||
| ); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are very nice and thorough tests.
The only thing I didn't see a test for was struct recursion -- like
A : {
B: "foo"
C: "bar"
}
I wonder if you could extend the A, B, C example below to include a nested struct as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done in 62ad373
|
Thanks for the feedback @alamb and @xudong963! I'll do a round of updates tomorrow |
Yeah, that took care of it. Thanks! |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @jonmmease -- thank you! I just kicked off the CI checks and if those succeed I'll plan to merge this in
|
Looks great -- thanks again for the contribution @jonmmease ! |
Which issue does this PR close?
Closes #602
Rationale for this change
DataFusion already works well, in many cases, with Arrow
StructArraycolumns. This PR adds a correspondingScalarValuevariant that can be used to represent a single element of aStructArray.What changes are included in this PR?
This is currently a fairly minimal PR that adds a new
ScalarValue::Structvariant, largely following the existing pattern for theScalarValue::Listvariant. TheStructvariant should support the same feature set asScalarValue::List.Let me know if there are uses of
ScalarValuethroughout the rest of DataFusion that should be updated to include struct support before these changes are merged. Thanks!Are there any user-facing changes?
Yes, a new
ScalarValueenumeration variant