Skip to content

Conversation

@AlvinJ15
Copy link
Contributor

@AlvinJ15 AlvinJ15 commented Dec 3, 2021

ARROW-1699: [C++] forward, backward fill kernel functions

@github-actions
Copy link

github-actions bot commented Dec 3, 2021

@github-actions
Copy link

github-actions bot commented Dec 3, 2021

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@AlvinJ15 AlvinJ15 marked this pull request as draft December 3, 2021 19:26
@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch 3 times, most recently from ea62a7a to 638aa61 Compare December 6, 2021 09:40
@AlvinJ15 AlvinJ15 marked this pull request as ready for review December 6, 2021 09:41
@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch 7 times, most recently from 29597b4 to 6638d56 Compare December 6, 2021 17:19
@lidavidm lidavidm self-requested a review December 6, 2021 17:24
@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch 2 times, most recently from 0758449 to ba4edd8 Compare December 9, 2021 05:53
@AlvinJ15
Copy link
Contributor Author

AlvinJ15 commented Dec 9, 2021

@lidavidm all comments were solved

@jorisvandenbossche
Copy link
Member

Small naming suggestion: maybe we could use "fill_null_forward" instead of "fill_forward_null" to align it more with the existing "fill_null"

@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch 2 times, most recently from a15ebe7 to 77530fd Compare December 13, 2021 09:02
@lidavidm
Copy link
Member

For the failures with chunked arrays, the kernel needs these options set:

diff --git a/cpp/src/arrow/compute/kernels/vector_replace.cc b/cpp/src/arrow/compute/kernels/vector_replace.cc
index d85965cfd..8ee4bbfe1 100644
--- a/cpp/src/arrow/compute/kernels/vector_replace.cc
+++ b/cpp/src/arrow/compute/kernels/vector_replace.cc
@@ -799,6 +799,8 @@ void RegisterVectorFunction(FunctionRegistry* registry,
     kernel.mem_allocation = MemAllocation::type::PREALLOCATE;
     kernel.signature = Functor<FixedType>::GetSignature(get_id.id);
     kernel.exec = std::move(exec);
+    kernel.can_execute_chunkwise = false;
+    kernel.output_chunked = false;
     DCHECK_OK(func->AddKernel(std::move(kernel)));
   };
   auto add_primitive_kernel = [&](detail::GetTypeId get_id) {
diff --git a/cpp/src/arrow/compute/kernels/vector_replace_test.cc b/cpp/src/arrow/compute/kernels/vector_replace_test.cc
index 742facf19..48a0b40f5 100644
--- a/cpp/src/arrow/compute/kernels/vector_replace_test.cc
+++ b/cpp/src/arrow/compute/kernels/vector_replace_test.cc
@@ -110,7 +110,7 @@ class TestReplaceKernel : public ::testing::Test {
                                   const std::shared_ptr<ChunkedArray> array,
                                   const std::shared_ptr<ChunkedArray>& expected) {
     ASSERT_OK_AND_ASSIGN(auto actual, func(Datum(*array), nullptr));
-    AssertChunkedEqual(*expected, *actual.chunked_array());
+    AssertChunkedEquivalent(*expected, *actual.chunked_array());
   }

The test still fails, but I'll leave that for further debugging.

Basically, with a chunked array input, by default, the compute infrastructure will split up the inputs and feed the kernel one chunk at a time, instead of the entire chunked array. In that case, it expects the kernel to output Array, not ChunkedArray, as it will assemble the final ChunkedArray itself. Also, in that case, it expects the kernel to keep track of its own state (in KernelState in KernelContext). Setting these options tells the compute infrastructure not to do this splitting and not to expect an Array output.

@lidavidm
Copy link
Member

Also note that I would expect this:

@@ -1255,7 +1255,7 @@ TYPED_TEST(TestFillNullNumeric, FillForwardChunkedArray) {
   this->AssertFillNullChunkedArray(
       FillForwardNull,
       this->chunked_array({"[1,2,3,null,null,4]", "[5, null, null]", "[null, 7, null]"}),
-      this->chunked_array({"[1,2,3,3,3,4]", "[5, 5, 5]", "[null, 7, 7]"}));
+      this->chunked_array({"[1,2,3,3,3,4]", "[5, 5, 5]", "[5, 7, 7]"}));
 }
 }  // namespace compute
 }  // namespace arrow

i.e. the ChunkedArray should behave as one large array, not an array of independent arrays.

@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch from 77530fd to 3c0f708 Compare December 16, 2021 07:55
@AlvinJ15 AlvinJ15 requested a review from lidavidm December 16, 2021 08:00
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this needs to be rebased now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the check here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Date Types, there's not exist a Random implementation, and it return an nullptr array_random

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder here.
For Date Types which are part of the NumericBasedTypes, there's not exist a Random implementation, and it return an nullptr array_random

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlvinJ15: what is the reason for this check? If there is one, please let me know - I'm just wondering why it's here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the next comment, no support for rand.Numeric( ) for DateType, even if I use rand.ArrayOf no support for ConstantArrayGenerator::Numeric(DateTypes).

@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch from 3c0f708 to 21a2885 Compare December 17, 2021 07:48
@AlvinJ15
Copy link
Contributor Author

@lidavidm I resolved all comments. The AppVeyor gave me errors on pipeline, how Can I solve that error?

@lidavidm
Copy link
Member

It's not related to this PR. We can ignore it.

@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch 2 times, most recently from cb4a1a3 to f77d30a Compare December 29, 2021 06:42
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. We just need to make sure the forward/backward implementations are consistent now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the check here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Date Types which are part of the NumericBasedTypes, there's not exist a Random implementation, and it return an nullptr array_random

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind using ArrayOf instead then? It should handle all types:

std::shared_ptr<Array> ArrayOf(std::shared_ptr<DataType> type, int64_t size,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It work for generate a random_array, but currently there is not support for DateTypes for generate a ConstantArray which is needed for this test.
auto constant_array = ConstantArrayGenerator::Numeric<TypeParam>(len_null, x_ptr[len_random - 1]) Concatenate({array_random, constant_array, array_random}))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please document why the check is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or actually, frankly: just add those cases to ConstantArrayGenerator::Numeric, e.g.

case Date32Type:
  return Int32(size, ...)->View(date32());

template <typename ArrowType, typename CType = typename ArrowType::c_type>
static std::shared_ptr<Array> Numeric(int64_t size, CType value = 0) {
switch (ArrowType::type_id) {
case Type::BOOL:
return Boolean(size, static_cast<bool>(value));
case Type::UINT8:
return UInt8(size, static_cast<uint8_t>(value));
case Type::INT8:
return Int8(size, static_cast<int8_t>(value));
case Type::UINT16:
return UInt16(size, static_cast<uint16_t>(value));
case Type::INT16:
return Int16(size, static_cast<int16_t>(value));
case Type::UINT32:
return UInt32(size, static_cast<uint32_t>(value));
case Type::INT32:
return Int32(size, static_cast<int32_t>(value));
case Type::UINT64:
return UInt64(size, static_cast<uint64_t>(value));
case Type::INT64:
return Int64(size, static_cast<int64_t>(value));
case Type::FLOAT:
return Float32(size, static_cast<float>(value));
case Type::DOUBLE:
return Float64(size, static_cast<double>(value));
default:
return nullptr;
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm the types were added and the condition was removed

@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch from f77d30a to 58aee3d Compare December 30, 2021 23:05
@AlvinJ15 AlvinJ15 requested a review from lidavidm December 30, 2021 23:15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlvinJ15: what is the reason for this check? If there is one, please let me know - I'm just wondering why it's here.

@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch from 58aee3d to 37bd845 Compare January 4, 2022 04:36
@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch 2 times, most recently from 323ee82 to 8942626 Compare January 4, 2022 05:15
@AlvinJ15 AlvinJ15 requested a review from lidavidm January 4, 2022 05:17
@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch from 8942626 to 1e8bf58 Compare January 4, 2022 05:36
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be formatted again.

@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch from 1e8bf58 to 1b0aa99 Compare January 4, 2022 19:22
@AlvinJ15 AlvinJ15 requested a review from lidavidm January 4, 2022 19:24
@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch from 1b0aa99 to f921e53 Compare January 4, 2022 20:01
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these checks be eliminated?

That said, I'm not sure we need random tests for slicing. I think the hardcoded ones should be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I deleted this random tests for slicing.

@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch from f921e53 to d9fb98b Compare January 5, 2022 03:45
@AlvinJ15 AlvinJ15 force-pushed the achunga/1699-bfill-ffill-kernel-functions branch from d9fb98b to 3998a7f Compare January 5, 2022 04:05
@AlvinJ15 AlvinJ15 requested a review from lidavidm January 5, 2022 04:44
@lidavidm lidavidm closed this in ec38aeb Jan 5, 2022
@lidavidm
Copy link
Member

lidavidm commented Jan 5, 2022

Ah shoot, I forgot to ask you to update the docs. I'll file a minor PR.

@ursabot
Copy link

ursabot commented Jan 5, 2022

Benchmark runs are scheduled for baseline = 67a29fd and contender = ec38aeb. ec38aeb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] ursa-i9-9960x
[Finished ⬇️0.26% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

lidavidm added a commit that referenced this pull request Jan 5, 2022
Add the docs for the functions from ARROW-1699/PR #11853 since they were omitted in the PR.

Closes #12082 from lidavidm/arrow-1699

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants