Skip to content

Conversation

@jonmmease
Copy link
Contributor

Which issue does this PR close?

Closes #602

Rationale for this change

DataFusion already works well, in many cases, with Arrow StructArray columns. This PR adds a corresponding ScalarValue variant that can be used to represent a single element of a StructArray.

What changes are included in this PR?

This is currently a fairly minimal PR that adds a new ScalarValue::Struct variant, largely following the existing pattern for the ScalarValue::List variant. The Struct variant should support the same feature set as ScalarValue::List.

Let me know if there are uses of ScalarValue throughout the rest of DataFusion that should be updated to include struct support before these changes are merged. Thanks!

Are there any user-facing changes?

Yes, a new ScalarValue enumeration variant

IntervalYearMonth(Option<i32>),
/// Interval with DayTime unit
IntervalDayTime(Option<i64>),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick. Delete the blank line to keep the same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e25e500

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jonmmease -- this looks great and I am excited to extend DataFusion's type system support to include better handling of structs. @Igosuki and @houqp have been working to add the ability to select fields in https://github.com/apache/arrow-datafusion/pull/1006/files and it seems to me that this code sets the foundation for field access to struct types as well.

There appears to be a small rustfmt issue preventing CI from running on this PR: https://github.com/apache/arrow-datafusion/pull/1091/checks?check_run_id=3850128421

(I think it can be resolved by running cargo fmt and checking in the result)

Otherwise, if you could just add a test for a nested struct I think this PR is ready to go

Comment on lines 1282 to 1288
let mut fields: Vec<Field> = Vec::new();
let mut scalars: Vec<ScalarValue> = Vec::new();

value.iter().for_each(|(name, scalar)| {
fields.push(Field::new(name, scalar.get_datatype(), false));
scalars.push(scalar.clone());
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to write this in a more functional style (and avoid mut) using unzip: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.unzip

Something like (untested)

let (fields, scalars) = value.iter()
      .for_each(|(name, scalar)| {
            (Field::new(name, scalar.get_datatype(), false),
            scalar.clone())
        })
      .unzip();

Copy link
Contributor

@alamb alamb Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear I don't think this change is needed for this PR, I just wanted to point it out -- it took me quite a while to get comfortable with the functional parts of rust -- at first I thought it was just style preference, but then the longer I work in rust I realize it is one of the key mechanisms rust uses to eliminate bounds checks during iteration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I hadn't realized Rust had an unzip. Thanks for the suggestion. Done in 459c029

)),
None
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are very nice and thorough tests.

The only thing I didn't see a test for was struct recursion -- like

A : {
  B: "foo"
  C: "bar"
}

I wonder if you could extend the A, B, C example below to include a nested struct as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done in 62ad373

@jonmmease
Copy link
Contributor Author

Thanks for the feedback @alamb and @xudong963! I'll do a round of updates tomorrow

@jonmmease
Copy link
Contributor Author

(I think it can be resolved by running cargo fmt and checking in the result)

Yeah, that took care of it. Thanks!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @jonmmease -- thank you! I just kicked off the CI checks and if those succeed I'll plan to merge this in

@jonmmease
Copy link
Contributor Author

Thanks @alamb. The build failed on some clippy warnings that should be fixed by 9e6472d.

@alamb alamb merged commit d2da3e6 into apache:master Oct 11, 2021
@alamb
Copy link
Contributor

alamb commented Oct 11, 2021

Looks great -- thanks again for the contribution @jonmmease !

@houqp houqp added the enhancement New feature or request label Nov 6, 2021
unkloud pushed a commit to unkloud/datafusion that referenced this pull request Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for StructArray to ScalarValue

4 participants