Skip to content

Conversation

@nirandaperera
Copy link
Contributor

@nirandaperera nirandaperera commented Jun 16, 2021

This PR adds fixed and variable binary type support

@nirandaperera nirandaperera marked this pull request as draft June 16, 2021 05:15
@github-actions
Copy link

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good on a first pass. I suggested some minor simplifications.

@nirandaperera
Copy link
Contributor Author

@pitrou @lidavidm I made some changes to the code. As Antoine pointed out previously, the null values in the output string array would have non-empty slots in the data buffer. I think it requires a small fix. I'll add that here as well. But I'd like to get your feedback on the code flow.

@nirandaperera nirandaperera marked this pull request as ready for review July 1, 2021 20:21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO these cases could be consolidated into one or two checks since there are only 12 cases here (or 5 if you want to prune: null cond, true cond with null/non-null left array, false cond with null/non-null right array).

Copy link
Contributor Author

@nirandaperera nirandaperera Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, there are 64 cases TBH. Each param can be null/ non-null and scalar/ array (4 variations). Since we have 3 params we have 4x4x4 = 64 permutations. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckWithDifferentShapes actually checks for all those cases + with offsets. But I agree, in CheckWithDifferentShapes it could check similar cases redundantly in several iterations. But we decided to leave it as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit but I wonder if the compiler could generate similar (enough) code from a

struct CopyBinaryBulk {
    const offset_type* offsets;
    const uint8_t* values;
    offset_type* out_offsets;
    uint8_t* out_values;

    operator()(...) { ... }
};

so you don't have to copy-paste these everywhere

@lidavidm
Copy link
Member

lidavidm commented Jul 2, 2021

N.B. it looks like this segfaults in the tests on Windows MSVC 2019 (https://github.com/apache/arrow/pull/10538/checks?check_run_id=2966481230#step:8:339) and there are various errors on Clang, MSVC, and MinGW (mostly related to implicit casts or needing to explicitly parameterize std::max).

@nirandaperera nirandaperera requested a review from lidavidm July 2, 2021 21:27
@nirandaperera nirandaperera requested a review from lidavidm July 13, 2021 04:05
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just data() here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe checked_cast<const FixedSizeBinaryType&>(*left.type()) or checked_pointer_cast<FixedSizeBinaryType>(left.type()) (will be dynamic cast in debug, static cast in release)

@lidavidm
Copy link
Member

Looks like this also needs to be rebased now. But once that's done and the nits above are addressed I think this can be merged.

# Conflicts:
#	cpp/src/arrow/compute/kernels/scalar_if_else.cc
#	cpp/src/arrow/compute/kernels/scalar_if_else_benchmark.cc
#	cpp/src/arrow/compute/kernels/scalar_if_else_test.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants