Incorrect case-insensitive matching of character ranges

Character range matching is conceptually `(range_start..range_end).any(|c| c == input_char)`, but as an optimization is implemented as `range_start <= input_char && input_char <= range_end`. This is fine.

Case-insensitive matching is implemented as `uppercase(c) == uppercase(input_char)`. This is fine (modulo #55).

So case-insensitive range matching is conceptually `(range_start..range_end).any(|c| uppercase(c) == uppercase(input_char))`. It is currently implemented as `uppercase(range_start) <= uppercase(input_char) && uppercase(input_char) <= uppercase(range_end)` which is **not** equivalent.

[One of the tests currently passing](https://github.com/rust-lang/regex/blob/399758aeae3dcab382b0af5fa9964c1e32066dda/regex_macros/tests/tests.rs#L329) is that `(?i)\p{Lu}+` matches `ΛΘΓΔα` entirely. That is, greek letters (both upper case and lower case) all match the category of upper case letters when matched case-insensitively. But the same test with `\p{Ll}` (category of lower case letters) instead of `\p{Lu}` currently fails because of this issue. (`\p{Lu}` and `\p{Ll}` expand to large unions of character ranges.)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect case-insensitive matching of character ranges #76

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect case-insensitive matching of character ranges #76

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions