-
Notifications
You must be signed in to change notification settings - Fork 844
Ensure that scripts without the notion of upper/lower case can create du identifiers #9199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
You forgot the at-sign for my name, but I found it anyway ;) May I suggest that we disallow characters that are not allowed identifier characters as first char? That would be more in line with how it currently works, and still allow unicameral (and I always thought that term was used for parliamentary systems) scripts to create DU cases and partial match patterns. That'd really only apply to quoted identifiers anyway, as otherwise there's no way to include such characters. |
src/absil/illib.fs
Outdated
| let isUpperCaseCharacter c = | ||
| // if IsUpper and IsLower return the same value, then we can't tell if it's upper or lower case, so it is a case insensensitive language | ||
| // thusly the char must be good. | ||
| if Char.IsUpper c = Char.IsLower c then (c <> '_') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be something like
if Char.IsUpper c = Char.IsLower c then Char.IsLetter cinstead? Allowing union cases and active patterns to start with arbitrary symbols like + seems to be a too big breaking change to me and does allow much more than is needed to fix the issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@auduchinok, I quite like that, however, Identifiers must begin with letters already and so it's only quoted identifiers that would really be impacted and we still need to cope with '_'.
But yes I will make that adjustment we have the restriction today, and keeping it as is, has a lot of value.
| let (|positive|negative|) n = if n < 0 then positive else negative | ||
| let (|`` A``|) (x:int) = x | ||
| let (|B1|``+B2``|) (x:int) = if x = 0 then OneA else ``+B2`` | ||
| let (|`` C``|_|) (x:int) = if x = 0 then Some(x) else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to preserve these cases? So, if produced errors change one day it'll be reflected here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@auduchinok once we do the is letter thingy they will be errors again :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it'll be OK. :)
What I wanted to say was if there is a test for some particular behavior that is being changed it could be good to keep the test just in case it might be changed again (even accidentally, e.g. if fixing an issue produced by the initial change). It's probably just a good thing to record how the particular behavior is being changed anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @auduchinok mate, I understood mate, was messin' with ya :-)
|
@abelbraaksma , it was 2:30 in the morning, I'm sorry I forgot the @. I wish I had thought of the IsLetter gizmo @auduchinok found a really nice solution. |
|
@abelbraaksma , if you look up Bicameral on-line they seem to prefer to show the split parliament definition. To get a definition about languages I searched bicameral script. Unicameral is the logical term, but it doesn't really roll of the tongue either. |
In normal times, you ought to be in a bar drinking a beer at such times!
I vaguely remember there's an Btw, iirc, these BCL methods are not Astral plane aware. Meaning that the fix won't hold for valid code points above the BMP. Not sure that's a problem though. |
|
@abelbraaksma I will take a look for isIdChar, we have to validate id's somewhere :-) |
|
@abelbraaksma , @auduchinok updated, please take a look. Thanks |
cartermp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change, small nit about another comment
| let isUpper (s: string) = | ||
| s.Length >= 1 && Char.IsUpper s.[0] && not (Char.IsLower s.[0]) | ||
|
|
||
| // Scripts that distinguish between upper and lower case (bicameral) DU Discriminators and Active Pattern identifiers are required to start with an upper case character. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Scripts - this comment implies that this change is only relevant for F# scripts. It's also relevant to regular .fs files, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cartermp , scripts in this case means written form of a language.
… du identifiers (dotnet#9199) * Ensure that scripts without the notion of upper/lower case can create du identifiers * Feedback and improvements * feedback * correct comment
Fixes #9186
Fixes #4697
Bicamaral scripts such as that Latin or Cyrillic alphabets have the notion of upper and lower case characters. The F# language used this to place a requirement on DU discriminators and PatternMatch identifiers be upper case only.
This restriction is difficult for developers using a unicameral script, such as hindi, placing the requirement that they use an uppercase Latin character for example to satisfy the compiler.
This PR addresses this issue by relaxing the requirement to require an upper case character begin a DU Discriminator, or PatternMatch identifier. The new rule becomes, where the initial character has an upper case variant, the upper case variant must be used. If the initial character does not have an upper case variant, then it will satisfy the DU/PatternMatch id requirement.
This relaxation has some impact even on developers using a unicameral script. For example:
this used to produce a compiler error:
+B2 is a quoted identifier that had a non uppercase character. In the new scheme it is valid because + has no identifiable upper case and so satisfies the requirement.
//cc: abelbraaksma, @cartermp, @hmnjeon