What version of regex are you using?
regex == 1.12.3
regex_lite == 0.1.9
Describe the bug at a high level.
The Unicode character U+0595 (which is an accent mark) has different behavior for \b between the two engines.
What are the steps to reproduce the behavior?
dbg!(Regex::new("\\b").unwrap().find_iter("\u{595}").collect::<Vec<_>>());
dbg!(LRegex::new("\\b").unwrap().find_iter("\u{595}").collect::<Vec<_>>());
What is the actual behavior?
[src/main.rs:15:5] Regex::new("\\b").unwrap().find_iter("\u{595}").collect::<Vec<_>>() = [
Match {
start: 0,
end: 0,
string: "",
},
Match {
start: 2,
end: 2,
string: "",
},
]
[src/main.rs:16:5] LRegex::new("\\b").unwrap().find_iter("\u{595}").collect::<Vec<_>>() = []
What is the expected behavior?
I don't know, but the same behavior for both. I found it through fuzzing.
What version of regex are you using?
regex == 1.12.3regex_lite == 0.1.9Describe the bug at a high level.
The Unicode character U+0595 (which is an accent mark) has different behavior for
\bbetween the two engines.What are the steps to reproduce the behavior?
What is the actual behavior?
What is the expected behavior?
I don't know, but the same behavior for both. I found it through fuzzing.