Skip to content

Conversation

@lidavidm
Copy link
Member

For consistency with match_substring, this is the equivalent of Python's re.search(), not re.match().

@github-actions
Copy link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no conceptual reason for matchers to be templated on offset type. The iteration loop on string values could be moved out of the matcher and into the calling MatchSubstring class. Then all a matcher does is take a string_view or similar and return the index where the pattern was found (or presumably -1 if not found).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I refactored that part. The Knuth-Morris-Pratt implementation also had to be refactored as it was templated on offset_type, though it seems that was unnecessary.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one last nit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed the renumbering of notes here (and below).

Copy link
Contributor

@cyb70289 cyb70289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, +1

Oops, forget to check the CI.

@pitrou
Copy link
Member

pitrou commented Apr 1, 2021

It looks like there's a failure on Windows in debug mode (see CI):

[ RUN      ] TestStringKernels/0.MatchSubstring
C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29910\include\vector(1508) : Assertion failed: vector subscript out of range

@lidavidm
Copy link
Member Author

lidavidm commented Apr 1, 2021

It's reproducible on Linux by temporarily using .at() for array access. I incorrectly turned vector<T> foo(N) into reserve(N) when it should have been resize(N).

@lidavidm
Copy link
Member Author

lidavidm commented Apr 6, 2021

This should be ready again (minus RTools 3.5/JNI which are known to be failing/flaky).

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@pitrou pitrou closed this in 67bf0ab Apr 6, 2021
pachadotdev pushed a commit to pachadotdev/arrow that referenced this pull request Apr 6, 2021
For consistency with match_substring, this is the equivalent of Python's re.search(), not re.match().

Closes apache#9838 from lidavidm/arrow-12134

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants