BUG: .str methods failing on PyArrow using regex with \Z by jorisvandenbossche · Pull Request #63705 · pandas-dev/pandas

jorisvandenbossche · 2026-01-16T10:54:45Z

This is only for match and needs to be generalized, if we think this approach is workable

The RE2 engine only supports \z and not \Z, and Python 3.14 actually also just added \z and kept \Z as an alias for compat. So for python users, up to recently, you could only use \Z, but pyarrow engine requires \z. As a user changing manually to use \z only works with Python 3.14+, so I think it would be good to handle this under the hood for the user.

rhshadrach · 2026-01-16T21:01:31Z

        if isinstance(pat, re.Pattern):
            pat, case, flags = self._preprocess_re_pattern(pat, case)

+        if pat.endswith("\\Z") and not pat.endswith("\\\\Z"):


This correctly handles what I think would be the vast majority of real cases and as far as I can tell has no false positives. However it does have false negatives. The following should give true:

r"\\\Z" (in general, an odd number of \)

r"text(\Z)

r"\Z text"

Only the first seems like it could maybe occur in practice, the other two are likely erroneous. And while I could see this being used with lookarounds, we'll already be falling back to object dtype with those so we don't need to be worried about them here.

Thanks for the analysis!

r"\\\Z" (in general, an odd number of \)

Should we count the number of trailing \? (and so only get here if the number of \ is not even). Something like:

if pat.endswith("\\Z") and not ((len(pat[:-1]) - len(pat[:-1].rstrip("\\"))) % 2 == 0):

We can remove one of the pat[:-1] (which I think will give a copy of the entire string). Also removed the not but I'm fine if that remains as the original.

# Second condition counts the number of `\` that pat ends with prior to Z if pat.endswith("\\Z") and (len(pat) - len(pat[:-1].rstrip("\\")) + 1) % 2 == 1:

Added that and added some test cases to one of the tests

jorisvandenbossche · 2026-01-19T00:25:08Z

        elif not pat.startswith("^"):
            pat = f"^({pat[0:-1]})$"
-        return self._str_match(pat, case, flags, na)
+        return ArrowStringArrayMixin._str_match(self, pat, case, flags, na)


@rhshadrach I changed this and added a self._has_regex_lookaround(pat) to ArrowStringArray._str_fullmatch, to ensure the method here certainly uses the mixin version of _str_match, and not the ArrowStringArray, which would otherwise call the validation methods a second time (which can fail after replacing \Z to \z for older python, and generally we can also avoid the overhead of validating twice)

rhshadrach

lgtm; good to merge this as-is without handling higher number of \ as discussed above.

…63705)

BUG: .str.contains et al failing with PyArrow and using \Z

68eee02

jorisvandenbossche added this to the 3.0 milestone Jan 16, 2026

jorisvandenbossche requested a review from rhshadrach January 16, 2026 10:54

jorisvandenbossche added Strings String extension data type and string data Arrow pyarrow functionality labels Jan 16, 2026

rhshadrach reviewed Jan 16, 2026

View reviewed changes

jorisvandenbossche changed the title ~~BUG: .str.contains et al failing with PyArrow and using \Z~~ BUG: .str methods failing on PyArrow using regex with \Z Jan 18, 2026

jorisvandenbossche added 3 commits January 19, 2026 00:59

Merge remote-tracking branch 'upstream/main' into string-dtype-regex-z

733b683

avoid going through ArrowStringArray validation twice

abd0454

expand to contains/match/fullmatch

11b60ed

jorisvandenbossche commented Jan 19, 2026

View reviewed changes

jorisvandenbossche added 5 commits January 19, 2026 09:50

Merge remote-tracking branch 'upstream/main' into string-dtype-regex-z

7bdb35a

add test for extract

e3f3ecc

fix + add test for replace

1182294

add test for findall + split

685342a

fix + add test for count

c2f951a

jorisvandenbossche marked this pull request as ready for review January 19, 2026 14:13

jorisvandenbossche requested a review from rhshadrach January 20, 2026 15:07

rhshadrach approved these changes Jan 20, 2026

View reviewed changes

jorisvandenbossche added 3 commits January 21, 2026 09:42

Merge remote-tracking branch 'upstream/main' into string-dtype-regex-z

56f3d9b

account for odd cases

1db26fd

add whatsnew

df6993e

jorisvandenbossche merged commit c9b51fa into pandas-dev:main Jan 21, 2026
39 of 42 checks passed

jorisvandenbossche deleted the string-dtype-regex-z branch January 21, 2026 10:21

vkverma9534 pushed a commit to vkverma9534/pandas that referenced this pull request Jan 30, 2026

BUG: .str methods failing on PyArrow using regex with \Z (pandas-dev#…

9390284

…63705)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: .str methods failing on PyArrow using regex with \Z#63705

BUG: .str methods failing on PyArrow using regex with \Z#63705
jorisvandenbossche merged 12 commits intopandas-dev:mainfrom
jorisvandenbossche:string-dtype-regex-z

jorisvandenbossche commented Jan 16, 2026

Uh oh!

rhshadrach Jan 16, 2026

Uh oh!

jorisvandenbossche Jan 19, 2026

Uh oh!

rhshadrach Jan 20, 2026 •

edited

Loading

Uh oh!

jorisvandenbossche Jan 21, 2026

Uh oh!

jorisvandenbossche Jan 19, 2026

Uh oh!

rhshadrach left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jorisvandenbossche commented Jan 16, 2026

Uh oh!

rhshadrach Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

rhshadrach Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rhshadrach Jan 20, 2026 •

edited

Loading