Skip to content

Conversation

@lidavidm
Copy link
Member

@lidavidm lidavidm commented Sep 8, 2021

This adds support for non-nested types to both, except for MonthDayNanoInterval (which is not really sortable). hash_min_max additionally lacks binary/string-like types as they require a different approach (I will file a followup).

@lidavidm lidavidm changed the title ARROW-13882: [C++] Improve min_max/hash_min_max type support. ARROW-13882: [C++] Improve min_max/hash_min_max type support Sep 8, 2021
@github-actions
Copy link

github-actions bot commented Sep 8, 2021

@lidavidm
Copy link
Member Author

In light of the proposal to clarify how DayTime intervals are handled some changes here should probably be reverted.

@lidavidm
Copy link
Member Author

(Removed support for DayTimeInterval since it's only partially orderable.)

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you. Just a couple questions and nits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised. This should already be checked by the ValidateArray top-level function, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Odd, I do remember getting a crash for this reason here. But there is indeed a check:

if (type.id() != Type::EXTENSION) {
if (data.child_data.size() != static_cast<size_t>(type.num_fields())) {
return Status::Invalid("Expected ", type.num_fields(),
" child arrays in array "
"of type ",
type.ToString(), ", got ", data.child_data.size());
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this will generate different template specializations for types with the same underlying c_type. Is it desirable?
(note that GetValues is called with the output type independently from the ArrowType parameter)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this and the one below to use PhysicalType. I also replaced GetValues with MakeScalar (since now the static type doesn't match the runtime type).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be else if? Presumably value cannot be bother smaller than the min and greater than the max, and it would spare a string comparison.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update message here as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as for the scalar aggregations: this will generate a separate kernel per logical type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't see any interval in the type definition above. Is this comment outdated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the comments/docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither is day-time, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question.

@lidavidm lidavidm force-pushed the arrow-13882 branch 2 times, most recently from e9087e0 to c05cb96 Compare September 15, 2021 15:22
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you @lidavidm

@pitrou pitrou closed this in 2c405da Sep 16, 2021
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
This adds support for non-nested types to both, except for MonthDayNanoInterval (which is not really sortable). hash_min_max additionally lacks binary/string-like types as they require a different approach (I will file a followup).

Closes apache#11111 from lidavidm/arrow-13882

Lead-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants