pattern aliases do not ignore type constraints by trefis · Pull Request #1655 · ocaml/ocaml

trefis · 2018-03-12T11:20:50Z

build_as_type currently ignores type constraints. While this doesn't pose any soundness issue, it can lead to surprising results, e.g.

# let f = function ([] : int list) as x -> x;;
val f : int list -> 'a list = <fun>

The fix consists simply in reusing the pattern type if a constraint is present.
I added a few test cases to the testsuite and I believe the new behavior is an improvement.

trefis · 2018-03-12T15:05:50Z

To give more context regarding why I stumbled upon this: I am currently experimenting with a patch allowing a limited use of GADTs under or-patterns (for which I'll give more details in another PR soon), and the issue appeared with the following code:

type _ t =
  | IntLit : int t
  | BoolLit : bool t

let f (type a) (t : a t) a =
  match t, a with
  | IntLit, ((3 : a) as x)
  | BoolLit, ((true : a) as x) -> ignore x
  | _, _ -> ()

Without the patch proposed in this GPR, the compiler rejects the example above with the following error message:

Error: The variable x on the left-hand side of this or-pattern has type 
       a but on the right-hand side it has type bool

The reason being that for constant patterns (in this case 3) build_as_type will directly use the type of the pattern (which in our case is a thanks to the constraint), whereas for constructor patterns (here true) it will produce a new type based on the constructor, i.e. bool.
If we change build_as_type to always use the pattern type when a constraint is present (which is the change proposed by this GPR) then the code above is accepted and given the type 'a t -> 'a -> unit.

garrigue · 2018-03-13T06:22:51Z

I understand the problem you point at, but I do not agree with this semantics.
First, the original behavior is not wrong, so you should argue for the change.
I suppose the main argument is to allow the GADT code in your second comment, but still remember that GADTs are not yet supported explicitly in or-patterns.

Now, going back to a possibly acceptable semantics that does not ignore type constraints in patterns, the natural approach would be to translate the constraint twice, once for the pattern and once for the generated type. This matters particularly if there are some wild cards in the type, but it may also matter for principality reasons.

trefis · 2018-03-15T16:08:12Z

Now, going back to a possibly acceptable semantics that does not ignore type constraints in patterns, the natural approach would be to translate the constraint twice, once for the pattern and once for the generated type. This matters particularly if there are some wild cards in the type,

Indeed.
We had arrived at the same conclusion with Leo when discussing my changes to allow GADT under or-patterns, and I will update this PR to do that shortly.

it may also matter for principality reasons.

If I understand what you mean correctly, I believe that you are correct, and that there are already are some principality issues in the current implementation (related to the handling of mutable record fields):

# module M = struct type r = { lbl : int } end;;                
module M : sig type r = { lbl : int; } end
# let f = function { M.lbl = _ } as x -> x.lbl;;                
val f : M.r -> int = <fun>
# let f = function { contents = { M.lbl = _ } } as x -> !x.lbl;;
Characters 57-60:
  let f = function { contents = { M.lbl = _ } } as x -> !x.lbl;;
                                                           ^^^
Warning 18: this type-based field disambiguation is not principal.
val f : M.r ref -> int = <fun>

trefis · 2018-03-19T15:08:01Z

I rebased and did (in 857b38c) the change you proposed. At least I believe so.

This doesn't handle the issue pointed at in my previous message (about mutable record fields), I think I'll handle that in another PR.

garrigue · 2018-03-22T02:16:02Z

No, this is not what I meant. I was talking about the semantics, not the implementation.
I.e. it should behave as if the syntactic type annotation had been duplicated.
The example I have in mind is:

# function (`A, _ : _ * int) as x -> x;;
- : [< `A ] * int -> [> `A ] * int = <fun>

But there are various difficulties:

the algorithm needs to be modified to take the syntactic type into account
the semantics itself needs to be clarified. Following my first idea we would end up with

# function (`A, _ : 'a * int) as x -> x;;
- : [`A ] * int -> [`A ] * int = <fun>

because we end up forcing unification between [< `A] and [> `A].

Can you open a bug report, so that we can think a bit more about this problem?

xavierleroy · 2019-10-14T17:55:08Z

@trefis @garrigue : 18 months later, did you come to an agreement about what needs to be done here?

trefis · 2019-10-14T19:42:30Z

We haven't had time to properly discuss it yet. I still believe something should be done, though I'm not saying the approach proposed here is necessarily the right one.

I'm hoping we'll get some time to discuss this in the beginning of next year, in the meantime, I'd like to keep this PR open.
(I could also close this and open an issue instead if you prefer)

mbouaziz · 2020-06-17T09:00:56Z

In Tezos codebase we stumbled upon a similar issue with GADTs under or-patterns when using bisect_ppx (2.4.1 as of today) which introduces an as in the patterns:

type _ t = A : unit t | B : bool t

let f : type a. a t -> int =
  fun x -> match x with A | B -> 0

is instrumented as

let f : type a. a t -> int =
  fun x ->
    ___bisect_visit___ 2;
    (match x with
     | A|B as ___bisect_matched_value___ ->
         ((((match ___bisect_matched_value___ with
             | A -> (___bisect_visit___ 0; ())
             | B -> (___bisect_visit___ 1; ())
             | _ -> ()))
          [@ocaml.warning "-4-8-9-11-26-27-28"]);
          0))

rejected by the typechecker:

File "bin/main.ml", line 4, characters 28-29:
4 |   fun x -> match x with A | B -> 0
                                ^
Error: This pattern matches values of type bool t
       but a pattern was expected which matches values of type unit t
       Type bool is not compatible with type unit

I tried adding type annotations, e.g. ((A | B) : a t), without success but I guess this PR would solve it.

garrigue · 2020-06-18T10:06:16Z

Looking again at this PR, wouldn't it make sense to use a different syntax:

(A | B) as (x : a t)

Namely, what we want to choose here is the type of x, not the type of the expected value.

trefis · 2020-06-19T09:47:05Z

Note that that wouldn't help @mbouaziz, at least not straightaway, as the as-pattern is being generated by ppx_bisect. And it's not obvious that the ppx would ever be able to generate constraints that make sense.

I don't mind enriching what is accepted on as' rhs, I believe @gasche would like to go in that direction as well (actually, I think he'd like to go further and turn as into a conjunction).
But I still think it would be worth it to clarify (and possibly change) the semantics of as-pattern; in particular in the presence of constraints on the lhs.

gasche · 2020-06-19T15:12:25Z

What is the typing rule for p as x?

When we write match e0 with p0 -> ..., and p0 contains a subpattern p as x, a priori a valid typing rule would be to give to x the type of the sub-value of e0 at the access path corresponding to p within p0. For example, match (e : int list * bool) with ([] as x, _) -> ... would have x : int list.

This is fairly conservative, and OCaml uses a more flexible typing rule (and our programs rely on it), in particular for polymorphic variants: match (`A : [ `A ]) with `A as x -> ... gives a more "open" type [> `A ] that the scrutinee type (this is key for the construct | #foo as x -> ....

Ideally, the type inferred for x in p as x would not depend on the matched value, but it would be the most general type for values that may match p. I think that this is more or less what build_as_type is trying to do, except that it ignores annotations (which were unavailable before and are encoded in a baroque way through the extra field), leading to the surprising result of ([] : int list) giving the most general type 'a list instead of int list, as pointed out by @trefis.

Of course, this "most general type of p" business is made more complex by our use of directional type inference to propagate type annotations: for GADTs and also for type-based constructor disambiguation we need an expected type. So I guess that the specification would be of the form: given an expected type expected, what is the most general type ty for the pattern p that has expected as an instance?

(For equations introduced by GADTs, I think this suggests keeping the local type constructor as it is in the expected type (if there is one; otherwise we check the subpatterns without the equation), as it is more general than any ambiguous type that would use the equation.)

For the example of @garrigue function (`A, _ : _ * int) as x, we have no expected type. The most general type of (p : ty) would be the most general instance of ty that can type-check p, so this precisely corresponds to the most general type of p with expected type ty (with _ instantiated by a flexible inference variable). So here it's about the most general type for (`A, _) that is an instance of some 'a in ('a * int). (I can't reproduce the reasoning for why we consider that [> `A ] is the general type for the pattern `A, but this is somewhat orthogonal to the treatment of type annotations.)

gasche · 2020-06-24T13:07:29Z

I had a meeting this morning with @trefis and @garrigue. We arrived at a specification for the feature we liked, which suggests an implementation approach, which @trefis volunteered to implement.

The general idea to type p as x is that we want to "generalize" the types of the "structural parts" the pattern, that are written out explicitly by the user, and "share" the parts that directly refer to the scrutinee. For example, the inferred type for a sub-pattern of p that is [] or `A needs not be bound to the type of this part of the scrutinee, but on the other hand a wildcard pattern _ or z reuses exactly the scrutinee's type in that part. The question is how to say what we mean by "generalize" here, in such a way that we can tell if an implementation is correct or not.

The specification we propose uses a feature that does not exist in OCaml patterns today, namely p where x = e as a way to add a new binding to a pattern. With this feature we can express the idea that the value bound to x is not actually the sub-value of the scrutinee when p is reached within the pattern, but an eta-expansion of this value along the "structural parts" of p. We do this by saying that p as x should type-check as the following "desugaring" of this eta-expansion (applied recursively):

K (p1, .., pn) as x
==>
K (p1 as x1, .., pn as xn) where x = K (x1, .., xn)

(p | q) as x
==>
(p as x | q as x)

z as x
==>
z with x = z

_ as x
==>
x

For example, ([], _) as x desugars into ([] with x1 = [], x2) where x = (x1, x2), with type ('a list * bar) if the scrutinee as type (foo list * bar), `A as x desugars into `A where x = `A, with type [> `A ], and (A | B z) as x desugars into (A with x = A | B z with x = B z), where the type of z is "shared" as expected between the type of the scrutinee and the type of x.

The natural desugaring of type annotations is as follows:

(p : ty) as x
==>
(p as x' : ty) where x = (x' : ty)

where we see that two different fresh instances of ty will be taken during typing (if ty contains flexible variables, for example from occurrences of _ in the type), one to type the pattern and one to constrain the as variable. They do not need to be unified in any way: communication between the two instances will occur through the natural "sharing" of types in the leaves/variables of p. For example, if we match on a (foo list * bar), then writing

(([], z) : (_ * _)) as x

will give (([] where x1 = [], z where x2 = z) : _ * _) where x = ((x1, x2) : _ * _), so we get the type ('a list * bar) inferred as expected.

Note: as @trefis explained to me, in the implementation we cannot just take two instances of the pattern type as found in the typedtree at this point, as the type has already been "polluted" by unifications occurring during type-checking. We should either keep the syntactic type around (maybe it's already there?) or, during type-checking, proactively take two independent instances of the syntactic type, which @garrigue describes as "one for the input, one for the output" (one for the scrutinee's type, one in the environment exported by the pattern), or "for the inside, for the outside". The latter approach is what @trefis would like to implement.

Finally, @trefis and @garrigue discussed what happens with mutable fields: it is interesting as there the desugaring of as as I demonstrated above is not quite correct. Consider

type ('a, 'b) t = { mutable mut: 'a; pure : 'b }

then the pattern { mut = p; pure = q } as x would naturally desugar into { mut = (p as x1); pure = (q as x2) } where x = { mut = x1; pure = x2 }, but this is not what happens at runtime: at runtime x is really bound to the mutable record value, not to a shallow copy of it (and we can observe the difference). This desugaring allows to generalize what happens in the mut pattern, and this is unsound, as it would give to match { mut = []; pure = () } with { mut = [] } as x -> x the type ('a list, 'b) t which is unsound. Generalizing on mut is forbidden by the implementation (as it should), but generalizing on pure is allowed (as it can): so we get a type of the form ('_a list, 'b) t, with the second variable generalized. This corresponds to the type that would be given to a desugaring of the form

{ mut1 = p1; mut2 = p2; pure1 = q1; pure2 = q2; } as x
==>
{ x' with mut = p1; mut = p2; pure1 = q1 as x1; pure2 = q2 as x2 } where x = { x' with pure1 = x1; pure2 = x2 }

even though this desugaring is (not valid OCaml syntax and) incorrect from a runtime/operational/dynamic point of view, as it loses sharing in x. In the real implementation, we keep the sharing, and we "generalize" the type of the pure fields, but we are careful to not generalize impure fields.

trefis · 2020-06-24T15:45:27Z

We should either keep the syntactic type around (maybe it's already there?) or, during type-checking, proactively take two independent instances of the syntactic type, which @garrigue describes as "one for the input, one for the output" (one for the scrutinee's type, one in the environment exported by the pattern), or "for the inside, for the outside". The latter approach is what @trefis would like to implement.

I don't think the second approach you describe would work, if you have ((p : ct) as x1, None) as x shouldn't ct be "translated" thee times?

I do find it slightly distasteful to keep some parsetree inside the typedtree (even if it's an extra node), and I had considered instead introducing val refresh_type : Env.t -> Typedtree.core_type -> Typedtree.core_type in Typetexp, which would basically be a slightly simpler version of transl_type that operates on the typedtree instead of the parsetree.
Doing just that would result in a lot of code duplication, so I thought that we could split transl_type into two steps:

the first one that just builds the Typedtree.core_type skeleton by putting a dummy value inside the ctyp_type fields
the second one that would (recursively) fix up the ctyp_type fields by generating the correct type based on the typedtree.
This second stage lining-up exactly with refresh_type.

But this is more involved than simply storing the parsetree, and I'm not sure building a skeleton and then fixing it is a good idea even though no one should ever be able to observe this partially built core type (unless there is a bug in refresh_type that is).

Any opinion?

gasche · 2020-06-24T15:51:57Z

I don't know much about this part, but instead of having two flexible instances generated during typechecking, could we store a "rigid" version that we can later take instances of (and then one instance maybe would be taken immediately, if this is what typechecking needs)? This sounds related to your "refresh" proposal but with a slightly different perspective.

(I was hoping to get compliments on my rewording of Jacques "view patterns" formulation as a simpler/nicer recursive expansion of as.)

trefis · 2020-06-26T14:43:43Z

could we store a "rigid" version that we can later take instances of (and then one instance maybe would be taken immediately, if this is what typechecking needs)? This sounds related to your "refresh" proposal but with a slightly different perspective.

Indeed it does sound related. However, I went ahead with the "refresh" strategy, rather than the "rigid translation" one simply because it didn't require any thinking (on my part): one just need to do the same thing that is already being done.
For your proposal, one needs to decide which type variables should be made rigid, and which one shouldn't (I guess only variables corresponding to _ in the source are rigid).
Feel free to push back if you want this changed.

There is one question though that exists with all the approaches described here, which is: should the "globalisation" of variables be delayed when refreshing the constraint? I guess to answer that question one needs to know why it is delayed in the first place when typing p : ct. My guess is that it's used to delay some errors, which will make a difference in situations like this:

(* delayed *)
# let foo (x : 'a) = if true then x + 1 else let ('c' : string as 'a) = "" in 3;;
Line 1, characters 48-51:
1 | let foo (x : 'a) = if true then x + 1 else let ('c' : string as 'a) = "" in 3;;
                                                    ^^^
Error: This pattern matches values of type char
       but a pattern was expected which matches values of type string

(* eager *)
# let foo (x : 'a) = if true then x + 1 else let ('c' : string as 'a) = "" in 3;;
Line 1, characters 54-66:
1 | let foo (x : 'a) = if true then x + 1 else let ('c' : string as 'a) = "" in 3;;
                                                          ^^^^^^^^^^^^
Error: This type string should be an instance of type int

Of course there is nothing exercising this behavior in the testsuite, and no explanation in the commit which introduced it. So we will need @garrigue to explain that one.

trefis · 2020-06-29T12:04:40Z

I guess to answer that question one needs to know why it is delayed in the first place when typing p : ct.

I discussed this with @garrigue this morning, and this is actually done to prevent using external type information while typing patterns.
For instance if type variables in constraints were globalized straight away, the following example would be accepted:

let f (x : 'a) y =
  if true then
    ignore (x : M.t)
  else
    match y with (A : 'a) -> ()

even though it's clearly not principal (I believe #1931 would help in this case, but #1931 has its own issues).

Independently of principality concerns, this can also impact the type produced in presence of polymorphic variants. @garrigue said he would provide an example of this (in the form of a testsuite addition).

Going back to the "refresh type" vs "rigid constraint" discussion, @garrigue solution is closer to the "rigidify"-version, except there is no need to make the variables rigid, they just need to be quantified, i.e. they just need to be introduced at the generic level. As for the "which variables?" question, we both agree that only the ones coming from underscores in the source should be generalized.

I'll try to implement that.

PS: looking at globalize_used_variables we saw:

if try unify env v ty; true with _ -> Btype.backtrack snap; false
then try
  r := (loc, v,  TyVarMap.find name !type_variables) :: !r
with Not_found ->
  ...

And @garrigue wasn't sure in which situations the unification could fail, and why it was then ok to ignore that faire and not globalize the variable.

It sure would be nice if an explanation emerged.

trefis · 2020-06-30T08:26:35Z

I updated the content of the PR, and it is now ready for review.

garrigue

LGTM

trefis · 2020-06-30T13:43:48Z

Rebased and changelog entry added.

@gasche: do you want to have a look before I merge?

gasche

I made a review for code readability, but haven't looked yet at whether I understand the actual change in the Tpat_constraint case.

testsuite/tests/typing-misc/build_as_type.ml

gasche · 2020-07-03T09:17:50Z

testsuite/tests/typing-misc/disambiguate_principality.ml

-3 |     !x.lbl
-           ^^^
-Warning 18: this type-based field disambiguation is not principal.
-val u : M.r ref -> int = <fun>


I'm slightly confused here, why is it that !x.lbl does not warn about principality anymore but x := { lbl = 4 } does? Basically I don't understand why the test above still fails in principal mode, given that I would expect (_ : t) as x and (x : t) to be equivalent patterns when t contains no flexible inference variable.

I would expect (_ : t) as x and (x : t) to be equivalent patterns when t contains no flexible inference variable.

And indeed they are, if you look at examples r and s a few lines above, you'll notice the warning is emitted in the same situations:

let r arg = match arg with | (x : M.r ref) -> !x.lbl ;; [%%expect{| val r : M.r ref -> int = <fun> |}] let s arg = match arg with | (x : M.r ref) -> x := { lbl = 4 } ;; [%%expect{| val s : M.r ref -> unit = <fun> |}, Principal{| Line 4, characters 9-20: 4 | x := { lbl = 4 } ^^^^^^^^^^^ Warning 18: this type-based record disambiguation is not principal. val s : M.r ref -> unit = <fun> |}]

The warning here has nothing to do with the type of the pattern, it's imputable to the way applications are typed. Indeed the following don't warn:

let s' arg = match arg with | (x : M.r ref) -> x.contents <- { lbl = 4 } ;; [%%expect{| val s' : M.r ref -> unit = <fun> |}] let s'' arg = match arg with | (x : M.r ref) -> ((:=) : _ -> M.r -> unit) x { lbl = 4 } ;; [%%expect{| val s'' : M.r ref -> unit = <fun> |}]

but this does:

let s''' (arg : M.r ref) = arg := { lbl = 4 } ;; [%%expect{| val s''' : M.r ref -> unit = <fun> |}, Principal{| Line 2, characters 9-20: 2 | arg := { lbl = 4 } ^^^^^^^^^^^ Warning 18: this type-based record disambiguation is not principal. val s''' : M.r ref -> unit = <fun> |}]

typing/typecore.ml

typing/typetexp.ml

typing/typecore.ml

gasche

I think that the code would be nicer if transl_simpl_type also returned a tuple with cty.cty_type as well, but I have the impression that you didn't want to do this refactoring as it would touch more code not affected by this PR. Okay. In any case, the new code reads nicer than the first second iteration, thanks! If @garrigue reviewed the core logic for correctness, I'm happy to also approve.

trefis mentioned this pull request Mar 15, 2018

more precise principality warning regarding record fields disambiguation #1661

Merged

trefis force-pushed the build-as-type branch from 4212c95 to 857b38c Compare March 19, 2018 15:04

mbouaziz mentioned this pull request Jun 17, 2020

Typechecking issue with GADTs under or-patterns aantron/bisect_ppx#325

Closed

trefis force-pushed the build-as-type branch 2 times, most recently from 3d7b090 to ec0b02f Compare June 26, 2020 14:33

trefis force-pushed the build-as-type branch from ec0b02f to a819418 Compare June 30, 2020 08:28

garrigue approved these changes Jun 30, 2020

View reviewed changes

trefis force-pushed the build-as-type branch from a819418 to e05b805 Compare June 30, 2020 13:42

trefis changed the title ~~Make Typecore.build_as_type respect constraints~~ pattern aliases do not ignore type constraints Jun 30, 2020

gasche reviewed Jul 3, 2020

View reviewed changes

trefis added 2 commits July 8, 2020 10:37

add test showing constraints are ignored

5c2f029

build_as_type tests: add Jacques' example

0557307

trefis added 2 commits July 8, 2020 10:37

transl_type_delayed: return a type scheme

4fab40e

build_as_type: do not ignore constraints

cc369cd

trefis force-pushed the build-as-type branch 2 times, most recently from d0393c7 to 487ebd1 Compare July 8, 2020 15:42

trefis added 7 commits July 8, 2020 17:46

Changes

f772424

freak change

3196e3d

review comments in testsuite/../build_as_type.ml

7ac885d

transl_simple_type_delayed: changes suggested by @gasche

9201230

add an example inspired by reviewer comments

ba075c3

build_as_type: look at all the constraints

577187f

Typecore.build_as_type: add a reference to PR

01c40ed

trefis force-pushed the build-as-type branch from 026b54b to 01c40ed Compare July 8, 2020 15:47

gasche approved these changes Jul 8, 2020

View reviewed changes

check-typo

a5ca3a1

trefis merged commit 4edc4b9 into ocaml:trunk Jul 9, 2020

trefis deleted the build-as-type branch July 9, 2020 10:20

gasche mentioned this pull request Feb 3, 2021

unexpected interaction between or-patterns, as-patterns and polymorphic variants #10193

Closed

gasche mentioned this pull request Jun 19, 2023

Fix interaction of or-patterns, GADTs and as-patterns #11799

Open

Conversation

trefis commented Mar 12, 2018

Uh oh!

trefis commented Mar 12, 2018

Uh oh!

garrigue commented Mar 13, 2018

Uh oh!

trefis commented Mar 15, 2018

Uh oh!

trefis commented Mar 19, 2018

Uh oh!

garrigue commented Mar 22, 2018

Uh oh!

xavierleroy commented Oct 14, 2019

Uh oh!

trefis commented Oct 14, 2019

Uh oh!

mbouaziz commented Jun 17, 2020

Uh oh!

garrigue commented Jun 18, 2020

Uh oh!

trefis commented Jun 19, 2020

Uh oh!

gasche commented Jun 19, 2020

Uh oh!

gasche commented Jun 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trefis commented Jun 24, 2020

Uh oh!

gasche commented Jun 24, 2020

Uh oh!

trefis commented Jun 26, 2020

Uh oh!

trefis commented Jun 29, 2020

Uh oh!

trefis commented Jun 30, 2020

Uh oh!

garrigue left a comment

Choose a reason for hiding this comment

Uh oh!

trefis commented Jun 30, 2020

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gasche Jul 3, 2020

Choose a reason for hiding this comment

Uh oh!

trefis Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gasche commented Jun 24, 2020 •

edited

Loading