-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1699: [C++] forward, backward fill kernel functions #11853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1699: [C++] forward, backward fill kernel functions #11853
Conversation
|
|
ea62a7a to
638aa61
Compare
29597b4 to
6638d56
Compare
0758449 to
ba4edd8
Compare
|
@lidavidm all comments were solved |
|
Small naming suggestion: maybe we could use "fill_null_forward" instead of "fill_forward_null" to align it more with the existing "fill_null" |
a15ebe7 to
77530fd
Compare
|
For the failures with chunked arrays, the kernel needs these options set: diff --git a/cpp/src/arrow/compute/kernels/vector_replace.cc b/cpp/src/arrow/compute/kernels/vector_replace.cc
index d85965cfd..8ee4bbfe1 100644
--- a/cpp/src/arrow/compute/kernels/vector_replace.cc
+++ b/cpp/src/arrow/compute/kernels/vector_replace.cc
@@ -799,6 +799,8 @@ void RegisterVectorFunction(FunctionRegistry* registry,
kernel.mem_allocation = MemAllocation::type::PREALLOCATE;
kernel.signature = Functor<FixedType>::GetSignature(get_id.id);
kernel.exec = std::move(exec);
+ kernel.can_execute_chunkwise = false;
+ kernel.output_chunked = false;
DCHECK_OK(func->AddKernel(std::move(kernel)));
};
auto add_primitive_kernel = [&](detail::GetTypeId get_id) {
diff --git a/cpp/src/arrow/compute/kernels/vector_replace_test.cc b/cpp/src/arrow/compute/kernels/vector_replace_test.cc
index 742facf19..48a0b40f5 100644
--- a/cpp/src/arrow/compute/kernels/vector_replace_test.cc
+++ b/cpp/src/arrow/compute/kernels/vector_replace_test.cc
@@ -110,7 +110,7 @@ class TestReplaceKernel : public ::testing::Test {
const std::shared_ptr<ChunkedArray> array,
const std::shared_ptr<ChunkedArray>& expected) {
ASSERT_OK_AND_ASSIGN(auto actual, func(Datum(*array), nullptr));
- AssertChunkedEqual(*expected, *actual.chunked_array());
+ AssertChunkedEquivalent(*expected, *actual.chunked_array());
}The test still fails, but I'll leave that for further debugging. Basically, with a chunked array input, by default, the compute infrastructure will split up the inputs and feed the kernel one chunk at a time, instead of the entire chunked array. In that case, it expects the kernel to output Array, not ChunkedArray, as it will assemble the final ChunkedArray itself. Also, in that case, it expects the kernel to keep track of its own state (in KernelState in KernelContext). Setting these options tells the compute infrastructure not to do this splitting and not to expect an Array output. |
|
Also note that I would expect this: @@ -1255,7 +1255,7 @@ TYPED_TEST(TestFillNullNumeric, FillForwardChunkedArray) {
this->AssertFillNullChunkedArray(
FillForwardNull,
this->chunked_array({"[1,2,3,null,null,4]", "[5, null, null]", "[null, 7, null]"}),
- this->chunked_array({"[1,2,3,3,3,4]", "[5, 5, 5]", "[null, 7, 7]"}));
+ this->chunked_array({"[1,2,3,3,3,4]", "[5, 5, 5]", "[5, 7, 7]"}));
}
} // namespace compute
} // namespace arrowi.e. the ChunkedArray should behave as one large array, not an array of independent arrays. |
77530fd to
3c0f708
Compare
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this needs to be rebased now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Date Types, there's not exist a Random implementation, and it return an nullptr array_random
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder here.
For Date Types which are part of the NumericBasedTypes, there's not exist a Random implementation, and it return an nullptr array_random
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlvinJ15: what is the reason for this check? If there is one, please let me know - I'm just wondering why it's here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as the next comment, no support for rand.Numeric( ) for DateType, even if I use rand.ArrayOf no support for ConstantArrayGenerator::Numeric(DateTypes).
3c0f708 to
21a2885
Compare
|
@lidavidm I resolved all comments. The AppVeyor gave me errors on pipeline, how Can I solve that error? |
|
It's not related to this PR. We can ignore it. |
cb4a1a3 to
f77d30a
Compare
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We just need to make sure the forward/backward implementations are consistent now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Date Types which are part of the NumericBasedTypes, there's not exist a Random implementation, and it return an nullptr array_random
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind using ArrayOf instead then? It should handle all types:
arrow/cpp/src/arrow/testing/random.h
Line 380 in c6143a2
| std::shared_ptr<Array> ArrayOf(std::shared_ptr<DataType> type, int64_t size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It work for generate a random_array, but currently there is not support for DateTypes for generate a ConstantArray which is needed for this test.
auto constant_array = ConstantArrayGenerator::Numeric<TypeParam>(len_null, x_ptr[len_random - 1]) Concatenate({array_random, constant_array, array_random}))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please document why the check is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or actually, frankly: just add those cases to ConstantArrayGenerator::Numeric, e.g.
case Date32Type:
return Int32(size, ...)->View(date32());arrow/cpp/src/arrow/testing/generator.h
Lines 129 to 157 in 91e3ac5
| template <typename ArrowType, typename CType = typename ArrowType::c_type> | |
| static std::shared_ptr<Array> Numeric(int64_t size, CType value = 0) { | |
| switch (ArrowType::type_id) { | |
| case Type::BOOL: | |
| return Boolean(size, static_cast<bool>(value)); | |
| case Type::UINT8: | |
| return UInt8(size, static_cast<uint8_t>(value)); | |
| case Type::INT8: | |
| return Int8(size, static_cast<int8_t>(value)); | |
| case Type::UINT16: | |
| return UInt16(size, static_cast<uint16_t>(value)); | |
| case Type::INT16: | |
| return Int16(size, static_cast<int16_t>(value)); | |
| case Type::UINT32: | |
| return UInt32(size, static_cast<uint32_t>(value)); | |
| case Type::INT32: | |
| return Int32(size, static_cast<int32_t>(value)); | |
| case Type::UINT64: | |
| return UInt64(size, static_cast<uint64_t>(value)); | |
| case Type::INT64: | |
| return Int64(size, static_cast<int64_t>(value)); | |
| case Type::FLOAT: | |
| return Float32(size, static_cast<float>(value)); | |
| case Type::DOUBLE: | |
| return Float64(size, static_cast<double>(value)); | |
| default: | |
| return nullptr; | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidavidm the types were added and the condition was removed
f77d30a to
58aee3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlvinJ15: what is the reason for this check? If there is one, please let me know - I'm just wondering why it's here.
58aee3d to
37bd845
Compare
323ee82 to
8942626
Compare
8942626 to
1e8bf58
Compare
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be formatted again.
1e8bf58 to
1b0aa99
Compare
1b0aa99 to
f921e53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these checks be eliminated?
That said, I'm not sure we need random tests for slicing. I think the hardcoded ones should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I deleted this random tests for slicing.
f921e53 to
d9fb98b
Compare
d9fb98b to
3998a7f
Compare
|
Ah shoot, I forgot to ask you to update the docs. I'll file a minor PR. |
|
Benchmark runs are scheduled for baseline = 67a29fd and contender = ec38aeb. ec38aeb is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Add the docs for the functions from ARROW-1699/PR #11853 since they were omitted in the PR. Closes #12082 from lidavidm/arrow-1699 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
ARROW-1699: [C++] forward, backward fill kernel functions