Skip to content

Allow to name closed types in constructor type annotations#12507

Merged
garrigue merged 22 commits intoocaml:trunkfrom
COCTI:name_closed_types
Jan 19, 2024
Merged

Allow to name closed types in constructor type annotations#12507
garrigue merged 22 commits intoocaml:trunkfrom
COCTI:name_closed_types

Conversation

@garrigue
Copy link
Copy Markdown
Contributor

In #11891 it was pointed that, for constructor type annotations, the current approach of only allowing to name existentials that do not appear in the return type was too restrictive, as existentials could also appear through unification of the return type.

This PR relaxes this restriction, allowing to name any closed type appearing in the type of the arguments.
This seems to answer the request in #11891, however this means that we are now using local type definitions for two different things: existentials introduced by constructors, and aliases written in the annotation.
Note that while it could be possible to restrict those aliases to existentials, I'm not sure this would be a stable property: according to the order of unification, an existential could have been already refined. So I would prefer to stick to this relaxed definition. Note that there is still a scope restriction, so that one can only alias already known closed types.

@gasche gasche self-requested a review August 28, 2023 07:06
if ids <> [] then ignore begin
let get_decl_manifest id env =
let decl =
try Env.find_type (Path.Pident id) env with Not_found -> assert false in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that the code is re-querying the declaration and manifest of the ids types from the environment? Would it not be more straightforward to store those in the ids list?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids thinking hard about the code, but indeed it is ok to keep the values along, so I changed the code.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Sep 4, 2023

I think that the behaviour in this PR is surprising and confusing. I would expect any binding of the form (type a) to introduce an abstract type, but here it is introducing an alias. I'm fine with the abstract type being equal to types other than existentials outside of its scope, but within its scope it should be abstract.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 20, 2023

We discussed this in today's type-system meeting (with @garrigue, @goldfirere, @Octachron and @t6s). I would like to find something that let us name more types than today, because un-nameable types are unergonomic, but @lpw25 does not want to give names to types that can already be named (or at least some of them). So we tried to agree on a specification that makes me happy and, we hope, would also please Leo.

The proposal is as follows: allow to give a name to local abstract types that were introduced in the current pattern-matching clause.

Example:

type _ ty =
  | Int : int ty
  | Pair : 'b ty * 'c ty -> ('b * 'c) ty


(* Gabriel wants this accepted. The criterion accepts it. *)
let example1 : type a . a ty -> unit = function
| Int -> ()
| Pair (type b c) (x, y : b ty * c ty) -> ()

(* The criterion obviously rejects this, which requires the equation (b = int) *)
let example2 : (int * int) ty -> int = function
| Pair (type b c) (x, y : b ty * c ty) -> (5 : b)

(* The criterion also rejects this,
   even though it could be accepted if b, c were forced to remain abstract *)
let example2 : (int * int) ty -> unit = function
| Pair (type b c) (x, y : b ty * c ty) -> ()

(* The criterion also rejects this:
    a1, a2 are defined "earlier" than the pattern matching clause. *)
let example2 : type a1 a2 . (a1 * a2) ty -> unit = function
| Pair (type b c) (x, y : b ty * c ty) -> ()

@garrigue thinks that this restriction would be easy to implement, and we hope that it would make @lpw25 happy. I would be happy with it -- I think we could also justify accepting more, but I mostly care about the types that cannot be given a name otherwise.

@garrigue
Copy link
Copy Markdown
Contributor Author

I have implemented the restriction that we discussed yesterday.
Now only types that were freshly introduced can be bound, and only once.
Does it assuage @lpw25 's reservations ?

@garrigue
Copy link
Copy Markdown
Contributor Author

Note that what I implemented is actually stricter than what @gasche suggested.
That is, one can only bind the existentials that were generated by the unification of the constructor they annotate.
So there is only one place the binding can be done.
Buiding a counter-example is a bit tricky though.

@garrigue
Copy link
Copy Markdown
Contributor Author

I added an example showing the difference:

type 'a wrap = Wrap of 'a
type _ ty = Int : int ty | Pair : ('b ty * 'c ty) wrap -> ('b * 'c) ty
(* ok *)
let rec default : type a. a ty -> a = function
  | Int -> 0
  | Pair (type b c) (Wrap (b, c) : (b ty * c ty) wrap) ->
      (default b : b), (default c : c)
(* ko *)
let rec default : type a. a ty -> a = function
  | Int -> 0
  | Pair (Wrap (type b c) (b, c : b ty * c ty)) ->
      (default b : b), (default c : c)

The first default is fine because we bind b and c just when they are introduced, but the second one fails as it attempts to bind b and c to pre-existing existentials (introduced by Pair).

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 21, 2023

Yes, it makes a lot of sense to restrict to binding existentials introduced by the constructor. This is not mentioned in my restriction above because I wasn't sure it would be easy to implement / how to formulate the restriction.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Sep 21, 2023

Does it assuage @lpw25 's reservations ?

Yes, I'm happy with the new behaviour.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 21, 2023

Great, so all that is left to do is a code review :-)

Minor note: the testsuite is failing with the finaliser_handover test.

2023-09-21T06:32:10.0730084Z  ... testing 'finaliser_handover.ml' with 1 (native) Process 60125 got signal 6(Aborted), core dumped

This test is known to be flaky (#12345) and the result can be ignored in the context of this PR.

@garrigue
Copy link
Copy Markdown
Contributor Author

Note that there is still one situation that is not handled after this PR, namely when the newly introduced existential does not appear in the arguments.

type _ ty = Int : int ty | Pair : ('a * 'a) ty
let example : type a. a ty -> a -> unit = function
  | Int -> succ
  | Pair -> (fun (x,y) -> (y,x))

There is no syntax to bind the variable in Pair currently.
But this is orthogonal to this PR.

@goldfirere
Copy link
Copy Markdown
Contributor

For the case an existential variable does not appear in the arguments, GHC used to be able to do something that might translate to

type _ ty = Int : int ty | Pair : ('a * 'a) ty
let example : type a. a ty -> a -> unit = function
  | Int -> succ
  | Pair (type a) as p -> ignore (p : (a * a) ty); (fun (x, y) : a * a -> (y,x))

In Haskell, it was worse, requiring a second match against p.

Would the above syntax work here?

@garrigue
Copy link
Copy Markdown
Contributor Author

With the approach in the current PR it wouldn't work, because it requires all named variables to be bound inside the pattern itself. Moreover it requires a syntactic annotation on the argument, but there is no argument here...
I was thinking of either:

| Pair (type a. (a * a) ty) -> ...

but this is really new syntax, or:

| (Pair (type a) : (a * a) ty) -> ...

but this requires a new interpretation of type annotations in that case.
By the way, one can already use beta-expansion, but this is what we are trying to avoid.

let example : type a. a ty -> a -> a = function
  | Int -> succ
  | Pair as p -> (fun (type a) (Pair : (a * a) ty) (x, y) -> (y,x)) p

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code as a non-expert. The change is relatively short/small but I don't understand what is going on enough to vouch for its correctness.

but a pattern was expected which matches values of type "a ty * a"
Type "b" is not compatible with type "a"
Error: This type annotation attempts to bind "b"
to the already bound existential "a".
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is still rejected, so I think the name should stay ko3 and not ok3.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

^
Error: This type does not bind all existentials in the constructor: "type b. a"
Error: This type annotation attempts to bind "b"
to the non-locally-abstract type "'a".
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is confusing -- 'a does not occur in the user code. Could it be improved? Does the 'a come from the fact that b is not actually used in the pattern argument annotation? Maybe we can get a better message in this case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to track that b is not used other than by @alainfrisch 's logic for warnings, which I don't understand fully. Adding an extra level could work too, but this introduces new logic. Let me think about it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could actually check whether b is used in the pattern argument annotations without too much difficulty. For example, when you populate the initial environment with the name_list to get the ids, you could remember the identity of the flexible dummy variables for each id. Then on line 821 in the current PR, at the place where you raise the Bind_non_locally_abstract error, you can check whether the current identity of the type is still that flexible dummy variable, or if it was unified to something else. If it was not unified (it is still the dummy), you could raise a different error: the problem is not that the name is not being bound to a non-locally-abstract type, it is not being bound at all.

(I guess that this situation could occur in cases where the name does occur syntactically in the annotation, but those positions vanish during expansion. I don't think that this is a problem.)

Error: This pattern matches values of type "b * (b -> 'a)"
but a pattern was expected which matches values of type "b * (b -> 'b)"
This instance of "'a" is ambiguous:
it would escape the scope of its equation
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a better error message in the escape situation -- which is bound to be a fairly common error among users, giving a local name to a type coming from outside.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the scoping constraint on the type variable, as it is not necessary (it will be enforced later anyway). This way, we get the same error as below. The wording might be better though.

[%%expect{|
type _ ty = Int : int ty | Pair : 'b ty * 'c ty -> ('b * 'c) ty
val example : 'a ty -> 'a = <fun>
val example : 'a ty -> 'a = <fun>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples are nicer, simpler than the Thunk case above (which suffers from this corner case that the instance of the constructor is exactly the type parameter, rejecting more usage situations). Could it go above in the test file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

in
tps.tps_pattern_force <- force :: tps.tps_pattern_force;
(* Only unify the return type after generating the ids *)
unify_res ();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the "scope escape" type error is coming from here, and that it would be nice to catch it here to send the more specialized Bind_non_locally_abstract error instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the scope error that was occurring in the test suite.

let reason = match reason with
| Bind_already_bound -> "already bound existential"
| Bind_not_in_scope -> "previously defined existential"
| Bind_non_locally_abstract -> "non-locally-abstract type"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this error message could be more user-friendly. This corresponds to the common case where the user tries to bind an "old" type. "non-locally-abstract" is a correct description but it will be confusing for users.

For example I wouldn't mind an error message saying something like:

The local name ... can only be given to an existential variable bound to this GADT constructor. The type annotation tries to bind it to the name ... defined outside this constructor.

Copy link
Copy Markdown
Contributor Author

@garrigue garrigue Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as suggested

in
Location.errorf ~loc
"@[<hov0>This type annotation attempts to bind@ \"%s\"@ %s%s@ \"%a\".@]"
(Ident.name id) "to the " reason Printtyp.type_expr ty
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Octachron changed the way error messages are supposed to refer to inline code (including inline types), see the occurrences of Style.inline_code (Ident.name id) and (Style.as_inline_code Printtype.type_expr) ty in the rest of the file. You should match that new style instead of using manual quoting here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

let ids = List.map (fun x -> x.txt) ids in
let rem =
(* First process the existentials introduced by this constructor.
Just need to make their definitions abstract. *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question: why do we need to do two passes, one on the constructor existential instances and another on the bound existential names? Couldn't we do a single pass on the bound existential names, with logic that would also work well on the purely existential instances?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this would be possible, this would be less robust. Namely, for internal existentials we delay the creation of the abstract type, so that we will never show a strange internal name for the user. For existentials that were introduced by unification, we cannot do that, so we just create alias names, hoping that the strange internal name will not show up later.

let env =
Env.add_type ~check:false id
{decl with type_manifest = Some (correct_levels tv')} !!penv
in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic between the two loops (on existential instances, then on remaining user-defined names) is fairly different: in one case you create a new abstract (for the user-provided type name) and you unify, and in the other case you define the user-provided name as a synonym to the typer-generated existential. Can you explain why there is this difference? Will it make a difference for error messages? (In particular, for not-purely-existential names, are they shown in error messages in the right-hand-side of the clause with the name given by the user?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained above, in the latter case we do not replace the original name by the one given by the user, but just define an alias, so we cannot guarantee that the internal name will never show up. In short-path mode we could tune the heuristics to show the user-defined name.

^^^^^^^^^^^
Error: This type annotation attempts to bind "b"
to the previously defined existential "$0".
|}]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be curious to see an example with a type error in the right-hand-side of the pattern clause, to check that the types mentioned there are the one given explicitly by the user. For example what do we get with the following?

let rec example : type a . a ty -> a = function
| Int -> 0
| Pair (type b c) (x, y : b ty * c ty) -> (example x, example (*error*)x)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the example, and indeed this does not work well:

3 | | Pair (type b c) (x, y : b ty * c ty) -> (example x, example (*error*)x)
                                                          ^^^^^^^^^^^^^^^^^^
Error: This expression has type "b" = "$0" but an expression was expected of type
         "$1"

Copy link
Copy Markdown
Contributor Author

@garrigue garrigue Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also checked using -short-paths, and indeed this gives the message you wanted.

3 | | Pair (type b c) (x, y : b ty * c ty) -> (example x, example (*error*)x);;
                                                            ^^^^^^^^^^^^^^^^^^
Error: This expression has type "b" but an expression was expected of type "c"

@@ -778,21 +791,66 @@ let solve_constructor_annotation
| _ -> assert false
in
if ids <> [] then ignore begin
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this ignore is here and if it is still necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it was spurious; removed it.

^
Error: This type does not bind all existentials in the constructor: "type b. a"
Error: This type annotation attempts to bind "b"
to the non-locally-abstract type "'a".
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could actually check whether b is used in the pattern argument annotations without too much difficulty. For example, when you populate the initial environment with the name_list to get the ids, you could remember the identity of the flexible dummy variables for each id. Then on line 821 in the current PR, at the place where you raise the Bind_non_locally_abstract error, you can check whether the current identity of the type is still that flexible dummy variable, or if it was unified to something else. If it was not unified (it is still the dummy), you could raise a different error: the problem is not that the name is not being bound to a non-locally-abstract type, it is not being bound at all.

(I guess that this situation could occur in cases where the name does occur syntactically in the annotation, but those positions vanish during expansion. I don't think that this is a problem.)

@garrigue
Copy link
Copy Markdown
Contributor Author

garrigue commented Oct 5, 2023

(For some reason GitHub does not allow discussing an outdated chunk)

you can check whether the current identity of the type is still that flexible dummy variable, or if it was unified to something else.

Unfortunately, unification guarantees that types will become equal, but it does not say anything about which one will be redirected to the other. So it could very well be the case that we have unified with a variable that was redirected to point to the newly introduced abstract type, which cannot be detected if that variable had the same level (or higher). This can happen for instance with code like

match [] with
| (Pair (type a b) (x, y : a ty * b ty)) :: _ -> ()
| None -> ()

This said, one might consider that this code is too meaningless for caring about this false positive.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can tell that this is broadly reasonable, but I don't understand the details (and there are places where I don't understand why it cannot be simpler). I don't think that I will be able to understand the code much better in a reasonable amount of time, and I think that the feature is important and well-tested, so I propose to move forward by approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants