The dot (".") meta-character should exclude surrogates#544
Conversation
|
The branch is WIP: existing unit tests succeed, but:
|
|
I adjusted the But I wonder if excluding unpaired surrogates from |
lsf37
left a comment
There was a problem hiding this comment.
This looks good to me for now and can be merged. There is still the question if we should do more, which I'm not sure about. Let's keep discussing, but add that in a different PR.
|
Leaving out those unpaired surrogates from negated classes does make sense. My main concern is On the other hand, we would then get that The basic problem is that there exist inputs that contain sequences that do not resolve to characters, and one of the basic assumptions for the scanner from the dark ages of Java 1.1 is that every input is a sequence of characters. Would it make sense to make |
93c5c82 to
46281f9
Compare
|
(rebased on master) |
Followup from #538:
Definitions from section 3.9 of the Unicode v11.0 standard: