Allow to name closed types in constructor type annotations#12507
Allow to name closed types in constructor type annotations#12507garrigue merged 22 commits intoocaml:trunkfrom
Conversation
typing/typecore.ml
Outdated
| if ids <> [] then ignore begin | ||
| let get_decl_manifest id env = | ||
| let decl = | ||
| try Env.find_type (Path.Pident id) env with Not_found -> assert false in |
There was a problem hiding this comment.
Is there a reason that the code is re-querying the declaration and manifest of the ids types from the environment? Would it not be more straightforward to store those in the ids list?
There was a problem hiding this comment.
This avoids thinking hard about the code, but indeed it is ok to keep the values along, so I changed the code.
|
I think that the behaviour in this PR is surprising and confusing. I would expect any binding of the form |
|
We discussed this in today's type-system meeting (with @garrigue, @goldfirere, @Octachron and @t6s). I would like to find something that let us name more types than today, because un-nameable types are unergonomic, but @lpw25 does not want to give names to types that can already be named (or at least some of them). So we tried to agree on a specification that makes me happy and, we hope, would also please Leo. The proposal is as follows: allow to give a name to local abstract types that were introduced in the current pattern-matching clause. Example: type _ ty =
| Int : int ty
| Pair : 'b ty * 'c ty -> ('b * 'c) ty
(* Gabriel wants this accepted. The criterion accepts it. *)
let example1 : type a . a ty -> unit = function
| Int -> ()
| Pair (type b c) (x, y : b ty * c ty) -> ()
(* The criterion obviously rejects this, which requires the equation (b = int) *)
let example2 : (int * int) ty -> int = function
| Pair (type b c) (x, y : b ty * c ty) -> (5 : b)
(* The criterion also rejects this,
even though it could be accepted if b, c were forced to remain abstract *)
let example2 : (int * int) ty -> unit = function
| Pair (type b c) (x, y : b ty * c ty) -> ()
(* The criterion also rejects this:
a1, a2 are defined "earlier" than the pattern matching clause. *)
let example2 : type a1 a2 . (a1 * a2) ty -> unit = function
| Pair (type b c) (x, y : b ty * c ty) -> ()@garrigue thinks that this restriction would be easy to implement, and we hope that it would make @lpw25 happy. I would be happy with it -- I think we could also justify accepting more, but I mostly care about the types that cannot be given a name otherwise. |
9f9235c to
06811d3
Compare
|
I have implemented the restriction that we discussed yesterday. |
|
Note that what I implemented is actually stricter than what @gasche suggested. |
|
I added an example showing the difference: type 'a wrap = Wrap of 'a
type _ ty = Int : int ty | Pair : ('b ty * 'c ty) wrap -> ('b * 'c) ty
(* ok *)
let rec default : type a. a ty -> a = function
| Int -> 0
| Pair (type b c) (Wrap (b, c) : (b ty * c ty) wrap) ->
(default b : b), (default c : c)
(* ko *)
let rec default : type a. a ty -> a = function
| Int -> 0
| Pair (Wrap (type b c) (b, c : b ty * c ty)) ->
(default b : b), (default c : c)The first |
|
Yes, it makes a lot of sense to restrict to binding existentials introduced by the constructor. This is not mentioned in my restriction above because I wasn't sure it would be easy to implement / how to formulate the restriction. |
Yes, I'm happy with the new behaviour. |
|
Great, so all that is left to do is a code review :-) Minor note: the testsuite is failing with the This test is known to be flaky (#12345) and the result can be ignored in the context of this PR. |
|
Note that there is still one situation that is not handled after this PR, namely when the newly introduced existential does not appear in the arguments. type _ ty = Int : int ty | Pair : ('a * 'a) ty
let example : type a. a ty -> a -> unit = function
| Int -> succ
| Pair -> (fun (x,y) -> (y,x))There is no syntax to bind the variable in |
|
For the case an existential variable does not appear in the arguments, GHC used to be able to do something that might translate to In Haskell, it was worse, requiring a second match against Would the above syntax work here? |
|
With the approach in the current PR it wouldn't work, because it requires all named variables to be bound inside the pattern itself. Moreover it requires a syntactic annotation on the argument, but there is no argument here... | Pair (type a. (a * a) ty) -> ...but this is really new syntax, or: | (Pair (type a) : (a * a) ty) -> ...but this requires a new interpretation of type annotations in that case. let example : type a. a ty -> a -> a = function
| Int -> succ
| Pair as p -> (fun (type a) (Pair : (a * a) ty) (x, y) -> (y,x)) p |
gasche
left a comment
There was a problem hiding this comment.
I read the code as a non-expert. The change is relatively short/small but I don't understand what is going on enough to vouch for its correctness.
| but a pattern was expected which matches values of type "a ty * a" | ||
| Type "b" is not compatible with type "a" | ||
| Error: This type annotation attempts to bind "b" | ||
| to the already bound existential "a". |
There was a problem hiding this comment.
This example is still rejected, so I think the name should stay ko3 and not ok3.
| ^ | ||
| Error: This type does not bind all existentials in the constructor: "type b. a" | ||
| Error: This type annotation attempts to bind "b" | ||
| to the non-locally-abstract type "'a". |
There was a problem hiding this comment.
This error is confusing -- 'a does not occur in the user code. Could it be improved? Does the 'a come from the fact that b is not actually used in the pattern argument annotation? Maybe we can get a better message in this case.
There was a problem hiding this comment.
I'm not sure how to track that b is not used other than by @alainfrisch 's logic for warnings, which I don't understand fully. Adding an extra level could work too, but this introduces new logic. Let me think about it.
There was a problem hiding this comment.
I think you could actually check whether b is used in the pattern argument annotations without too much difficulty. For example, when you populate the initial environment with the name_list to get the ids, you could remember the identity of the flexible dummy variables for each id. Then on line 821 in the current PR, at the place where you raise the Bind_non_locally_abstract error, you can check whether the current identity of the type is still that flexible dummy variable, or if it was unified to something else. If it was not unified (it is still the dummy), you could raise a different error: the problem is not that the name is not being bound to a non-locally-abstract type, it is not being bound at all.
(I guess that this situation could occur in cases where the name does occur syntactically in the annotation, but those positions vanish during expansion. I don't think that this is a problem.)
| Error: This pattern matches values of type "b * (b -> 'a)" | ||
| but a pattern was expected which matches values of type "b * (b -> 'b)" | ||
| This instance of "'a" is ambiguous: | ||
| it would escape the scope of its equation |
There was a problem hiding this comment.
It would be nice to have a better error message in the escape situation -- which is bound to be a fairly common error among users, giving a local name to a type coming from outside.
There was a problem hiding this comment.
I removed the scoping constraint on the type variable, as it is not necessary (it will be enforced later anyway). This way, we get the same error as below. The wording might be better though.
| [%%expect{| | ||
| type _ ty = Int : int ty | Pair : 'b ty * 'c ty -> ('b * 'c) ty | ||
| val example : 'a ty -> 'a = <fun> | ||
| val example : 'a ty -> 'a = <fun> |
There was a problem hiding this comment.
These examples are nicer, simpler than the Thunk case above (which suffers from this corner case that the instance of the constructor is exactly the type parameter, rejecting more usage situations). Could it go above in the test file?
| in | ||
| tps.tps_pattern_force <- force :: tps.tps_pattern_force; | ||
| (* Only unify the return type after generating the ids *) | ||
| unify_res (); |
There was a problem hiding this comment.
I suspect that the "scope escape" type error is coming from here, and that it would be nice to catch it here to send the more specialized Bind_non_locally_abstract error instead.
There was a problem hiding this comment.
I removed the scope error that was occurring in the test suite.
typing/typecore.ml
Outdated
| let reason = match reason with | ||
| | Bind_already_bound -> "already bound existential" | ||
| | Bind_not_in_scope -> "previously defined existential" | ||
| | Bind_non_locally_abstract -> "non-locally-abstract type" |
There was a problem hiding this comment.
I think that this error message could be more user-friendly. This corresponds to the common case where the user tries to bind an "old" type. "non-locally-abstract" is a correct description but it will be confusing for users.
For example I wouldn't mind an error message saying something like:
The local name ... can only be given to an existential variable bound to this GADT constructor. The type annotation tries to bind it to the name ... defined outside this constructor.
There was a problem hiding this comment.
Done as suggested
typing/typecore.ml
Outdated
| in | ||
| Location.errorf ~loc | ||
| "@[<hov0>This type annotation attempts to bind@ \"%s\"@ %s%s@ \"%a\".@]" | ||
| (Ident.name id) "to the " reason Printtyp.type_expr ty |
There was a problem hiding this comment.
@Octachron changed the way error messages are supposed to refer to inline code (including inline types), see the occurrences of Style.inline_code (Ident.name id) and (Style.as_inline_code Printtype.type_expr) ty in the rest of the file. You should match that new style instead of using manual quoting here.
| let ids = List.map (fun x -> x.txt) ids in | ||
| let rem = | ||
| (* First process the existentials introduced by this constructor. | ||
| Just need to make their definitions abstract. *) |
There was a problem hiding this comment.
Naive question: why do we need to do two passes, one on the constructor existential instances and another on the bound existential names? Couldn't we do a single pass on the bound existential names, with logic that would also work well on the purely existential instances?
There was a problem hiding this comment.
While this would be possible, this would be less robust. Namely, for internal existentials we delay the creation of the abstract type, so that we will never show a strange internal name for the user. For existentials that were introduced by unification, we cannot do that, so we just create alias names, hoping that the strange internal name will not show up later.
| let env = | ||
| Env.add_type ~check:false id | ||
| {decl with type_manifest = Some (correct_levels tv')} !!penv | ||
| in |
There was a problem hiding this comment.
The logic between the two loops (on existential instances, then on remaining user-defined names) is fairly different: in one case you create a new abstract (for the user-provided type name) and you unify, and in the other case you define the user-provided name as a synonym to the typer-generated existential. Can you explain why there is this difference? Will it make a difference for error messages? (In particular, for not-purely-existential names, are they shown in error messages in the right-hand-side of the clause with the name given by the user?)
There was a problem hiding this comment.
As I explained above, in the latter case we do not replace the original name by the one given by the user, but just define an alias, so we cannot guarantee that the internal name will never show up. In short-path mode we could tune the heuristics to show the user-defined name.
| ^^^^^^^^^^^ | ||
| Error: This type annotation attempts to bind "b" | ||
| to the previously defined existential "$0". | ||
| |}] |
There was a problem hiding this comment.
I would be curious to see an example with a type error in the right-hand-side of the pattern clause, to check that the types mentioned there are the one given explicitly by the user. For example what do we get with the following?
let rec example : type a . a ty -> a = function
| Int -> 0
| Pair (type b c) (x, y : b ty * c ty) -> (example x, example (*error*)x)There was a problem hiding this comment.
I have added the example, and indeed this does not work well:
3 | | Pair (type b c) (x, y : b ty * c ty) -> (example x, example (*error*)x)
^^^^^^^^^^^^^^^^^^
Error: This expression has type "b" = "$0" but an expression was expected of type
"$1"
There was a problem hiding this comment.
I also checked using -short-paths, and indeed this gives the message you wanted.
3 | | Pair (type b c) (x, y : b ty * c ty) -> (example x, example (*error*)x);;
^^^^^^^^^^^^^^^^^^
Error: This expression has type "b" but an expression was expected of type "c"
typing/typecore.ml
Outdated
| @@ -778,21 +791,66 @@ let solve_constructor_annotation | |||
| | _ -> assert false | |||
| in | |||
| if ids <> [] then ignore begin | |||
There was a problem hiding this comment.
I wonder why this ignore is here and if it is still necessary.
There was a problem hiding this comment.
Indeed it was spurious; removed it.
| ^ | ||
| Error: This type does not bind all existentials in the constructor: "type b. a" | ||
| Error: This type annotation attempts to bind "b" | ||
| to the non-locally-abstract type "'a". |
There was a problem hiding this comment.
I think you could actually check whether b is used in the pattern argument annotations without too much difficulty. For example, when you populate the initial environment with the name_list to get the ids, you could remember the identity of the flexible dummy variables for each id. Then on line 821 in the current PR, at the place where you raise the Bind_non_locally_abstract error, you can check whether the current identity of the type is still that flexible dummy variable, or if it was unified to something else. If it was not unified (it is still the dummy), you could raise a different error: the problem is not that the name is not being bound to a non-locally-abstract type, it is not being bound at all.
(I guess that this situation could occur in cases where the name does occur syntactically in the annotation, but those positions vanish during expansion. I don't think that this is a problem.)
|
(For some reason GitHub does not allow discussing an outdated chunk)
Unfortunately, unification guarantees that types will become equal, but it does not say anything about which one will be redirected to the other. So it could very well be the case that we have unified with a variable that was redirected to point to the newly introduced abstract type, which cannot be detected if that variable had the same level (or higher). This can happen for instance with code like match [] with
| (Pair (type a b) (x, y : a ty * b ty)) :: _ -> ()
| None -> ()This said, one might consider that this code is too meaningless for caring about this false positive. |
gasche
left a comment
There was a problem hiding this comment.
I can tell that this is broadly reasonable, but I don't understand the details (and there are places where I don't understand why it cannot be simpler). I don't think that I will be able to understand the code much better in a reasonable amount of time, and I think that the feature is important and well-tested, so I propose to move forward by approving.
a6d6a30 to
1b5dca9
Compare
In #11891 it was pointed that, for constructor type annotations, the current approach of only allowing to name existentials that do not appear in the return type was too restrictive, as existentials could also appear through unification of the return type.
This PR relaxes this restriction, allowing to name any closed type appearing in the type of the arguments.
This seems to answer the request in #11891, however this means that we are now using local type definitions for two different things: existentials introduced by constructors, and aliases written in the annotation.
Note that while it could be possible to restrict those aliases to existentials, I'm not sure this would be a stable property: according to the order of unification, an existential could have been already refined. So I would prefer to stick to this relaxed definition. Note that there is still a scope restriction, so that one can only alias already known closed types.