-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13882: [C++] Improve min_max/hash_min_max type support #11111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
In light of the proposal to clarify how DayTime intervals are handled some changes here should probably be reverted. |
234913b to
f49f7cf
Compare
|
(Removed support for DayTimeInterval since it's only partially orderable.) |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thank you. Just a couple questions and nits.
cpp/src/arrow/array/validate.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised. This should already be checked by the ValidateArray top-level function, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Odd, I do remember getting a crash for this reason here. But there is indeed a check:
arrow/cpp/src/arrow/array/validate.cc
Lines 367 to 374 in 87b2fcd
| if (type.id() != Type::EXTENSION) { | |
| if (data.child_data.size() != static_cast<size_t>(type.num_fields())) { | |
| return Status::Invalid("Expected ", type.num_fields(), | |
| " child arrays in array " | |
| "of type ", | |
| type.ToString(), ", got ", data.child_data.size()); | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this will generate different template specializations for types with the same underlying c_type. Is it desirable?
(note that GetValues is called with the output type independently from the ArrowType parameter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this and the one below to use PhysicalType. I also replaced GetValues with MakeScalar (since now the static type doesn't match the runtime type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be else if? Presumably value cannot be bother smaller than the min and greater than the max, and it would spare a string comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update message here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark as for the scalar aggregations: this will generate a separate kernel per logical type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't see any interval in the type definition above. Is this comment outdated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the comments/docs.
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither is day-time, right?
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
e9087e0 to
c05cb96
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you @lidavidm
This adds support for non-nested types to both, except for MonthDayNanoInterval (which is not really sortable). hash_min_max additionally lacks binary/string-like types as they require a different approach (I will file a followup). Closes apache#11111 from lidavidm/arrow-13882 Lead-authored-by: David Li <li.davidm96@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
This adds support for non-nested types to both, except for MonthDayNanoInterval (which is not really sortable). hash_min_max additionally lacks binary/string-like types as they require a different approach (I will file a followup).