Skip to content

Conversation

@KevinRansom
Copy link
Contributor

Fixes #9186
Fixes #4697

Bicamaral scripts such as that Latin or Cyrillic alphabets have the notion of upper and lower case characters. The F# language used this to place a requirement on DU discriminators and PatternMatch identifiers be upper case only.

This restriction is difficult for developers using a unicameral script, such as hindi, placing the requirement that they use an uppercase Latin character for example to satisfy the compiler.

This PR addresses this issue by relaxing the requirement to require an upper case character begin a DU Discriminator, or PatternMatch identifier. The new rule becomes, where the initial character has an upper case variant, the upper case variant must be used. If the initial character does not have an upper case variant, then it will satisfy the DU/PatternMatch id requirement.

This relaxation has some impact even on developers using a unicameral script. For example:

this used to produce a compiler error:

let (|B1|``+B2``|) (x:int) = if x = 0 then OneA else ``+B2``

+B2 is a quoted identifier that had a non uppercase character. In the new scheme it is valid because + has no identifiable upper case and so satisfies the requirement.

//cc: abelbraaksma, @cartermp, @hmnjeon

@KevinRansom KevinRansom requested review from TIHan and dsyme May 15, 2020 08:52
@abelbraaksma
Copy link
Contributor

abelbraaksma commented May 15, 2020

You forgot the at-sign for my name, but I found it anyway ;)

May I suggest that we disallow characters that are not allowed identifier characters as first char? That would be more in line with how it currently works, and still allow unicameral (and I always thought that term was used for parliamentary systems) scripts to create DU cases and partial match patterns.

That'd really only apply to quoted identifiers anyway, as otherwise there's no way to include such characters.

let isUpperCaseCharacter c =
// if IsUpper and IsLower return the same value, then we can't tell if it's upper or lower case, so it is a case insensensitive language
// thusly the char must be good.
if Char.IsUpper c = Char.IsLower c then (c <> '_')
Copy link
Member

@auduchinok auduchinok May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be something like

if Char.IsUpper c = Char.IsLower c then Char.IsLetter c

instead? Allowing union cases and active patterns to start with arbitrary symbols like + seems to be a too big breaking change to me and does allow much more than is needed to fix the issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@auduchinok, I quite like that, however, Identifiers must begin with letters already and so it's only quoted identifiers that would really be impacted and we still need to cope with '_'.
But yes I will make that adjustment we have the restriction today, and keeping it as is, has a lot of value.

let (|positive|negative|) n = if n < 0 then positive else negative
let (|`` A``|) (x:int) = x
let (|B1|``+B2``|) (x:int) = if x = 0 then OneA else ``+B2``
let (|`` C``|_|) (x:int) = if x = 0 then Some(x) else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to preserve these cases? So, if produced errors change one day it'll be reflected here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@auduchinok once we do the is letter thingy they will be errors again :-).

Copy link
Member

@auduchinok auduchinok May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it'll be OK. :)

What I wanted to say was if there is a test for some particular behavior that is being changed it could be good to keep the test just in case it might be changed again (even accidentally, e.g. if fixing an issue produced by the initial change). It's probably just a good thing to record how the particular behavior is being changed anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @auduchinok mate, I understood mate, was messin' with ya :-)

@KevinRansom
Copy link
Contributor Author

KevinRansom commented May 15, 2020

@abelbraaksma , it was 2:30 in the morning, I'm sorry I forgot the @. I wish I had thought of the IsLetter gizmo @auduchinok found a really nice solution.

@KevinRansom
Copy link
Contributor Author

@abelbraaksma , if you look up Bicameral on-line they seem to prefer to show the split parliament definition. To get a definition about languages I searched bicameral script. Unicameral is the logical term, but it doesn't really roll of the tongue either.

@abelbraaksma
Copy link
Contributor

abelbraaksma commented May 15, 2020

it was 2:30 in the morning,

In normal times, you ought to be in a bar drinking a beer at such times!

I wish I had thought of the IsLetter gizmo @auduchinok found a really nice solution.

I vaguely remember there's an isIdChar function somewhere in the F# code base. Not sure if it does the same as IsLetter.

Btw, iirc, these BCL methods are not Astral plane aware. Meaning that the fix won't hold for valid code points above the BMP. Not sure that's a problem though.

@KevinRansom
Copy link
Contributor Author

@abelbraaksma I will take a look for isIdChar, we have to validate id's somewhere :-)

@KevinRansom
Copy link
Contributor Author

@abelbraaksma , @auduchinok updated, please take a look.

Thanks

@KevinRansom
Copy link
Contributor Author

/cc @TIHan , @cartermp

Copy link
Contributor

@cartermp cartermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change, small nit about another comment

let isUpper (s: string) =
s.Length >= 1 && Char.IsUpper s.[0] && not (Char.IsLower s.[0])

// Scripts that distinguish between upper and lower case (bicameral) DU Discriminators and Active Pattern identifiers are required to start with an upper case character.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Scripts - this comment implies that this change is only relevant for F# scripts. It's also relevant to regular .fs files, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cartermp , scripts in this case means written form of a language.

@KevinRansom KevinRansom merged commit 05c9f8b into dotnet:master May 22, 2020
@KevinRansom KevinRansom deleted the DUCasecase branch June 3, 2020 20:10
nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Feb 23, 2021
… du identifiers (dotnet#9199)

* Ensure that scripts without the notion of upper/lower case can create du identifiers

* Feedback and improvements

* feedback

* correct comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

6 participants