Skip to content

Conversation

@nirandaperera
Copy link
Contributor

This PR adds ascii and utf8 reverse kernels.

@github-actions
Copy link

@nirandaperera nirandaperera marked this pull request as ready for review May 13, 2021 21:52
@edponce
Copy link
Contributor

edponce commented May 14, 2021

Learned from @cyb70289 that new kernels need to extend both C++ and Python documentation:

  1. https://github.com/apache/arrow/blob/master/docs/source/cpp/compute.rst
  2. https://github.com/apache/arrow/blob/master/docs/source/python/api/compute.rst
    Note that kernel names are in alphabetical order.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I left some comments about wording - I don't think we need to explain the Unicode details here, just mention that we operate on codepoints/code units, and not grapheme clusters.

return utf8_char_found == 0;
}

static Status InvalidStatus() { return Status::Invalid("Non-ascii sequence in input"); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we be consistent with ASCII?

Copy link
Contributor

@edponce edponce May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think you should add InvalidStatus methods with specific messages that do not necessarily generalize for all string errors. Why not directly invoke Status::Invalid("Specific message to specific code block")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reusing the StringTransform class for both ascii and utf8 kernels. In that, it always throws a Status::Invalid("Invalid UTF8 sequence in input") which is not quite right for an ascii kernel. That's why it was generalized and offloaded to Derived::InvalidStatus() 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify here, the structure of the kernel means the 'implementation' doesn't directly return an error, only true/false. Would you prefer to change it so that it returns a status instead?

@edponce
Copy link
Contributor

edponce commented May 17, 2021

Remember to update Python documentation as well, https://github.com/apache/arrow/blob/master/docs/source/python/api/compute.rst

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit from me (tests are failing due to a small capitalization issue)

nirandaperera and others added 2 commits May 18, 2021 10:57
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this.

@lidavidm
Copy link
Member

The formatting needs to be fixed here (see the logs for the Dev action), the other CI failures look unrelated.

You could try @github-actions autotune to do that for you.

@nirandaperera
Copy link
Contributor Author

@lidavidm I think this looks okay now.

@lidavidm lidavidm closed this in 52e6d69 May 21, 2021
// would produce arrays with same lengths. Hence checking offset buffer equality
auto malformed_input = ArrayFromJSON(this->type(), "[\"ɑ\xFFɑa\", \"ɽ\xe1\xbdɽa\"]");
const Result<Datum>& res = CallFunction("utf8_reverse", {malformed_input});
ASSERT_TRUE(res->array()->buffers[1]->Equals(*malformed_input->data()->buffers[1]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, but FTR we probably have a AssertBufferEquals (or perhaps AssertBuffersEqual :-)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants