-
Notifications
You must be signed in to change notification settings - Fork 494
Description
Character range matching is conceptually (range_start..range_end).any(|c| c == input_char), but as an optimization is implemented as range_start <= input_char && input_char <= range_end. This is fine.
Case-insensitive matching is implemented as uppercase(c) == uppercase(input_char). This is fine (modulo #55).
So case-insensitive range matching is conceptually (range_start..range_end).any(|c| uppercase(c) == uppercase(input_char)). It is currently implemented as uppercase(range_start) <= uppercase(input_char) && uppercase(input_char) <= uppercase(range_end) which is not equivalent.
One of the tests currently passing is that (?i)\p{Lu}+ matches ΛΘΓΔα entirely. That is, greek letters (both upper case and lower case) all match the category of upper case letters when matched case-insensitively. But the same test with \p{Ll} (category of lower case letters) instead of \p{Lu} currently fails because of this issue. (\p{Lu} and \p{Ll} expand to large unions of character ranges.)