I haven’t timed it, but the case_fold_and_combine_ranges function introduced in #78 is probably slow for large sets of character ranges like \p{Lu}. (That is, slow to parse/compile a regex.) It can probably be improved to not consider individual chars when we can determine that none in a given range (or subrange) is affected by case folding.