Describe the bug
In some cases regexp_match will skip first and only match.
e.g. if pattern is foo and string to match is foo then should return single match foo. Currently returning empty array for the match (correctly finds there is a match, but doesn't return the match correctly).
To Reproduce
Example test in arrow-string/src/regexp.rs
#[test]
fn sandbox() {
let array = StringArray::from(vec![Some("foo")]);
let pattern = GenericStringArray::<i32>::from(vec![r"foo"]);
let actual = regexp_match(&array, &pattern, None).unwrap();
let result = actual.as_any().downcast_ref::<ListArray>().unwrap();
let elem_builder: GenericStringBuilder<i32> = GenericStringBuilder::new();
let mut expected_builder = ListBuilder::new(elem_builder);
expected_builder.values().append_value("foo");
expected_builder.append(true);
let expected = expected_builder.finish();
assert_eq!(&expected, result);
}
Will panic with:
thread 'regexp::tests::sandbox' panicked at 'assertion failed: `(left == right)`
left: `ListArray
[
StringArray
[
"foo",
],
]`,
right: `ListArray
[
StringArray
[
],
]`', arrow-string/src/regexp.rs:277:9
Can see the right (actual) has empty StringArray[] whereas expected contains the match: StringArray["foo"]
Expected behavior
Test should succeed.
Additional context
Seems its because by default skipping the first match in a capture group:
|
match re.captures(value) { |
|
Some(caps) => { |
|
for m in caps.iter().skip(1).flatten() { |
|
list_builder.values().append_value(m.as_str()); |
|
} |
|
list_builder.append(true); |
|
} |
|
None => list_builder.append(false), |
|
} |
Where in the test example above, caps has value:
[arrow-string/src/regexp.rs:212] &caps = Captures(
{
0: Some(
"foo",
),
},
)
Relevant regex doc: https://docs.rs/regex/latest/regex/struct.Regex.html#method.captures
Specifically:
Capture group 0 always corresponds to the entire match.
Original issue: apache/datafusion#5479
Describe the bug
In some cases
regexp_matchwill skip first and only match.e.g. if pattern is
fooand string to match isfoothen should return single matchfoo. Currently returning empty array for the match (correctly finds there is a match, but doesn't return the match correctly).To Reproduce
Example test in arrow-string/src/regexp.rs
Will panic with:
Can see the right (actual) has empty
StringArray[]whereas expected contains the match:StringArray["foo"]Expected behavior
Test should succeed.
Additional context
Seems its because by default skipping the first match in a capture group:
arrow-rs/arrow-string/src/regexp.rs
Lines 210 to 218 in 79518cf
Where in the test example above,
capshas value:Relevant regex doc: https://docs.rs/regex/latest/regex/struct.Regex.html#method.captures
Specifically:
Original issue: apache/datafusion#5479