Skip to content

Conversation

@bkietz
Copy link
Member

@bkietz bkietz commented Jun 11, 2020

No description provided.

@github-actions
Copy link

@wesm wesm force-pushed the 971-Implement-Array-isvalid-n branch from 5703901 to 8fa6330 Compare June 13, 2020 02:57
@wesm
Copy link
Member

wesm commented Jun 13, 2020

I rebased after fixing the linting issue ARROW-9120

Comment on lines 127 to 148
Copy link
Member

@wesm wesm Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly this seems quite complex / opaque to me -- I am not going to be eager to write unit tests in this fashion. I would personally rather keep things as simple and explicit as possible so it's obvious what is being tested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this looks over-engineered for what it seems to be doing (I may be misunderstanding) -- basically generating test data and checking expected values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll separate this into a separate PR, then. I think testing against random data has value, similar to using a fuzzer. Currently our random tests have a lot of boilerplate for generating the inputs and a lot of ad-hoc code for computing the expected values. IMHO it's worthwhile to have an interface for doing sanity checks across a wide swath of parameter space without needing to specify each of those manually (even if this is not enabled by default).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are free to open a separate PR but I'm against this particular implementation of this approach to testing so unless something materially changes about the way the developer specifies the tests I'm likely to vote against the PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm if you have any ideas which would make this interface simpler, could you comment in the JIRA?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the approach to testing is viable at all for this part of the project. The examples that were in this PR contained effectively reimplementations of the kernel logic in the property test specification. Let's look at Add for example

const auto& out_type = args[0]->type;

if (!args[0]->is_valid || !args[1]->is_valid) {
  return MakeNullScalar(out_type);
}

if (is_integer(out_type->id())) {
  ARROW_ASSIGN_OR_RAISE(auto lhs, Cast<UInt64Scalar>(args[0]));
  ARROW_ASSIGN_OR_RAISE(auto rhs, Cast<UInt64Scalar>(args[1]));
  return UInt64Scalar(lhs->value + rhs->value).CastTo(out_type);
}

if (is_floating(out_type->id())) {
  ARROW_ASSIGN_OR_RAISE(auto lhs, Cast<DoubleScalar>(args[0]));
  ARROW_ASSIGN_OR_RAISE(auto rhs, Cast<DoubleScalar>(args[1]));
  return DoubleScalar(lhs->value + rhs->value).CastTo(out_type);
}

return Status::NotImplemented("NYI");

I don't think it's a good idea to have a parallel collection of reimplementations of kernels.

@wesm
Copy link
Member

wesm commented Jun 13, 2020

I'm not thrilled about the testing mixin. Can we split all that out into a separate PR (if at all, will see what @pitrou thinks about it) so that these kernels are not held hostage over it?

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Wes that the added test harness doesn't seem worthwhile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the name "property" is supposed to refer to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I wrong, or is this generating a new test subclass for simply different parameter values? This does not seem sound, especially as it will blow up compile times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't generating new classes. TYPED_TEST_SUITE does this (one template instantiation per type) but INSTANTIATE_TEST_SUITE_P only iterates over the provided values (which in this case are all ScalarFunctionPropertyTestParam)

Comment on lines 127 to 148
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this looks over-engineered for what it seems to be doing (I may be misunderstanding) -- basically generating test data and checking expected values.

@bkietz bkietz force-pushed the 971-Implement-Array-isvalid-n branch from 8fa6330 to 7835000 Compare June 15, 2020 14:43
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a handful of comments


static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
if (arr.buffers[0] != nullptr && out->offset == arr.offset &&
out->length == arr.length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the offset and length checks are needed anymore, you can either remove them or make them DCHECK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Member Author

@bkietz bkietz Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it happens this fails for sliced input. I'll restore the checks. For your interest, one approach i tried for is_valid was a MetaFunction which invoked a no-op ScalarFunction with INTERSECTION null handling then yanked the null bitmap from that. Unfortunately INTERSECTION doesn't currently support the zero copy case and I wasn't sure the approach would be acceptable but it did avoid repetition of null bitmap handling logic. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, indeed zero copy is not possible here in general. What do you think about doing the zero copy when arr.offset % 8 == 0? Then you can slice the bitmap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I understand that the changes to CheckScalarUnary break some of the ascii_* kernels. I'll disable those tests in this patch and then I'll fix them in a patch that is based on this patch.

out->buffers[1] = arr.offset == 0
? arr.buffers[0]
: SliceBuffer(arr.buffers[0], arr.offset / 8, arr.length / 8);
out->offset = arr.offset % 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know off the top of my head what will be the implications of modifying the offset of the output value, but I think it should be okay, and we can always fix it later if it becomes an issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm not clear on what the kernel is/isn't allowed to modify in the out data. I've added can_write_into_slices=false , so IIUC the kernel should always exclusively own the out data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not worth thinking too hard right now because kernel pipelining (temporary elision) is not implemented yet so until that happens it would be just speculation.

@wesm
Copy link
Member

wesm commented Jun 16, 2020

@bkietz if you want to disable the ascii_* tests that are failing please go ahead

@wesm
Copy link
Member

wesm commented Jun 16, 2020

Merging

@wesm wesm closed this in 60cdc75 Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants