Optimize case_fold_and_combine_ranges?

I haven’t timed it, but the `case_fold_and_combine_ranges` function introduced in #78 is _probably_ slow for large sets of character ranges like `\p{Lu}`. (That is, slow to parse/compile a regex.) It can probably be improved to not consider individual `char`s when we can determine that none in a given range (or subrange) is affected by case folding.