Righteous ambivalence by garrigue · Pull Request #9767 · ocaml/ocaml

garrigue · 2020-07-15T03:47:24Z

This is an alternative fix to #9765 .

Rather than calling correct_levels in more branches, it does the right thing: ensure that the generic parts of the expected type are not modified by calls to type_expect.

It is built on top of #9765 but doesn't really reuse it (except for the example).
The 3 new commits are actually orthogonal

the first one allows to compile stdlib in -principal mode by adding (required!) type annotations in camlinternalFormat.ml
the second one fixes Fix #9759: Typing without -principal is broken in 4.11 and trunk #9765 by doing the right thing
the last one disables completely propagation between branches; this changes the behaviour for two examples.

The PR is larger, and probably shouldn't go in 4.11.
Since the last commit changes the behaviour, it is worth discussing the merits of this approach.

garrigue · 2020-07-15T04:52:59Z

#9765 is now a strict subset of this PR, so that it is no longer needed to fix #9759.
The role of this PR is rather to enforce more invariants.

Note that a review of the semantics of propagation is still needed.
Currently some cases of type_expect are careful of returning the inferred type rather than an instance of the expected type (required to stick to the approach in the PolyML paper), the expected type being only used as a hint, but some other cases just return an instance of the expected type.
It might be clearer to adopt some form of bidirectional inference, where the return type is always an instance of the expected type (possibly with some extra unsharing for the parts that could be inferred).

garrigue · 2020-07-15T04:54:30Z

@trefis, @lpw25, @diremy can be interested by this PR, and the discussion on propagation.

gasche · 2020-07-21T12:52:34Z

Minor remark:

It is built on top of #9765 but doesn't really reuse it (except for the example).

What do you mean by this? As far as I can tell, the one code change in #9765 (the use instance ty_record in the record case) is reused in this PR.

gasche · 2020-07-21T13:13:40Z

Naive questions.

Why do we need the begin_def () .. end_def () machinery here? As far as I vaguely understand, these blocks raise the current level, they were initially used for let-definitions which get generalized / introduced polymorphism, and are sprinkled in a few places related to polymorphism, and a couple more in -principal mode. Why are we now performing a generalization on each tuple, array or lazy expression in the program?

There are already def blocks in the pattern-matching code for those constructs, but those look slightly different from the ones introduced in this PR. In the array case for example, we have:

  | Ppat_array spl ->
      let ty_elt = newgenvar() in
      begin_def ();
      let expected_ty = instance expected_ty in
      end_def ();
      generalize_structure expected_ty;
      unify_pat_types ~refine
        loc env (Predef.type_array ty_elt) expected_ty;
      [...]

and now the PR proposes

  | Pexp_tuple sexpl ->
      begin_def ();
      let subtypes = List.map (fun _ -> newvar ()) sexpl in
      let to_unify = newty (Ttuple subtypes) in
      with_explanation (fun () ->
        unify_exp_types loc env to_unify (instance ty_expected));
      end_def ();
      List.iter generalize_structure subtypes;
      [...]

There are several differences here, for example:

in the pattern case, the inference variable is created at the generic level,
in the expression case at the local level (and then generalize_structure is called explicitly on it)
the unification occurs at the local level in the expression case, but only once back at the current level in the pattern case
we are always taking an instance of the expected type at the local level, but we only call generalize_structure ont it in the pattern case

Do these different approaches result in different behavior? My naive understanding is that the two approaches are essentially equivalent (in both cases the type parameter of the array constructor is lowered from generic to current level), so I'm not sure why we are writing things in fairly different ways. (The pattern approach look simpler, especially if we encapsulated taking-an-instance-at-a-local-level-then-lowering as a helper function).

Note: there is no instance of expected_ty taken in the Ppat_lazy case, is this a bug?

lpw25 · 2020-07-21T15:43:50Z

I had thought these cases were wrong when @trefis and I were adding the levels stuff to type_pat. That's why e.g. the Ppat_tuple case does something equivalent to the new version of Pexp_tuple here. But I had been unable to produce any actual buggy behaviour -- in the end I just left myself a todo item to come back later and try to find a bug. So I'm in favour of these changes, but I haven't read the code in detail yet.

garrigue · 2020-07-22T07:18:44Z

Minor remark:

It is built on top of #9765 but doesn't really reuse it (except for the example).

What do you mean by this? As far as I can tell, the one code change in #9765 (the use instance ty_record in the record case) is reused in this PR.

As explained in the second message, this sentence is no longer true: this PR now truly extends #9765.

garrigue · 2020-07-23T05:35:59Z

Taking a fresh instance of ty_expected is morally better that what was done in type_expect before, because it ensures that all the generic parts of the type scheme will be duplicated before unification. This may not be easily observable, but there is something wrong in unifying a type scheme, particularly one that we received, potentially modifying its representation. One could even imagine cases where this would cause the type scheme to become ambivalent, and cause errors later on; apparently there is already instantiation/duplication in other places so that it doesn't seem to really occur.

In you example, type_pat basically does the same thing as I propose for type_exp, in a subtly different way.
For uniformity, let's look at the tuple case:

      let spl_ann = List.map (fun p -> (p,newgenvar ())) spl in
      let ty = newgenty (Ttuple(List.map snd spl_ann)) in
      begin_def ();
      let expected_ty = instance expected_ty in
      end_def ();
      generalize_structure expected_ty;
      unify_pat_types ~refine loc env ty expected_ty;

It creates a generalized copy of the expected type, and unifies it with a fully general pattern type.
Since the expected type was copied, one will actually get the same structure and levels in expected_ty and subtypes as in the approach where both types were instantiated.
The difference is mostly a question of style, and being careful that while it is ok to unify directly with Predef.type_array ty_elt (which creates a new generic array node), one shouldn't unify directly with Predef.type_bool (which is shared).

Note also that the approach in type_pat can be slightly optimized:

      ...
      begin_def ();
      init_def generic_level;
      let expected_ty = instance expected_ty in
      end_def ();
      unify_pat_types ~refine loc env ty expected_ty;

That is, by setting the level to generic_level, one avoids to call generalize_structure.
This relies on the fact that the type variables in expected_ty are not at generic_level.

Finally, is there any advantage in doing the unification on types at current_level rather than generic_level?
While it is legal to do unification at generic_level, and levels will be lowered appropriately, this is partly the cause of the bug in #9759 (not seeing that only one side was a type scheme).
If we switch to a representation where nodes at generic_level are distinguished from shared nodes, then direct unification on type schemes may become illegal.

lpw25 · 2020-07-23T08:10:17Z

Finally, is there any advantage in doing the unification on types at current_level rather than generic_level?

I don't think there is any particular advantage to it. If you want to change the code in type_pat to match the code in typ_exp, that would be fine by me. It would be great to one day ban unification on type schemes.

gasche · 2020-07-23T08:27:52Z

Would it be possible to abstract away some of the machinery here in a helper function?

This is easier to do with the pattern approach, because basically it suffices to abstract

let generic_instance ty =
  begin_def ();
  init_def generic_level;
  let inst = instance expected_ty in
  end_def ();
  inst

[...]
| Ppat_array spatl ->
  let ty_elt = newgenvar () in
  unify_pat_types ~refine loc env (Predef.type_array ty_elt) (generic_instance expected_ty);
  [...]

| Pexp_array sargl ->
  let ty_elt = newgenvar () in
  with_explanation (fun () ->
    unify_exp_types loc env (Predef.type_array ty_elt) (generic_instance expected_ty));
  [...]

But I understand that unifying schemes (that is, types with some parts remaining at the generic level) is frowned upon. The expression approach proposed in this PR is a bit harder to factor out. Maybe something like:

let def_and_generalize f =
  begin_def ();
  let vars = Queue.create () in
  let gen_var () =
    let v = newvar () in
    Queue.add vars v; v
  let ty = f gen_var in
  end_def ();
  Queue.iter generalize_structure vars;
  ty

  | Ppat_array(spatl) ->
      let elt_ty =
        def_and_generalize @@ fun gen_var ->
	let ty = gen_var () in
	unify_pat_types ~refine loc env (Predef.type_array ty) (instance ty_expected);
        ty
      in

  | Pexp_array(sargl) ->
      let elt_ty =
        def_and_generalize @@ fun gen_var ->
	let ty = gen_var () in
	with_explanation (fun () ->
	  unify_exp_types loc env (Predef.type_array ty) (instance ty_expected));
	ty
      in

trefis · 2020-07-23T13:37:17Z

testsuite/tests/typing-gadts/didier.ml

+Error: This expression has type t = bool
+       but an expression was expected of type bool
+       This instance of bool is ambiguous:
+       it would escape the scope of its equation


The comments above and below these test are now stale.

garrigue · 2020-07-24T01:59:12Z

OK, up to now we have discussed implementation details.
I agree that it is probably better to have a uniform approach for type_exp and type_pat.
@gasche has a point that the generic instance approach is more easily factorizable, even if it may have to go in the future.

However, my main concern is not here, but rather the last commit, which makes ambiguity detection stricter, by removing the clutch that made camlinternalFormat.ml compilable, but only in non-principal mode.
For me, this is reasonable, as the vocation of the non-principal mode is not to compile more programs.
Yet, this may come at the cost of breaking some programs in the wild (requiring more type annotations).

gasche · 2020-07-24T05:25:05Z

One way to go would be to split the PR in two, quickly converge on the part that is consensual and fixes the bug, and then for the second commit with the last commit do opam-wide testing to have an idea of how much code is broken in the wild.

trefis · 2020-07-24T08:10:29Z

FTR: I'm quite keen to see the last commit being merged!

But @gasche suggestion also seems reasonable.

gasche · 2020-07-24T09:24:25Z

I'm also okay with deciding to go ahead and break some programs that were not annotated enough. They are easy to fix in a backward-compatible way.

This reverts commit 40cd089.

garrigue · 2020-07-24T10:56:14Z

I have done as announced:

use generic_instance (which was already defined) where possible
temporarily reinstate propagation between branches

Once we converge and merge this PR, I will make a separate PR removing propagation between branches.

gasche · 2020-07-24T12:22:17Z

I'm looking at the code now, and it looks much nicer, thanks!

Why is there no generic instance taken in the Ppat_lazy case?

gasche

Note: generic_instance could also be used in the build_as_type definition.

I went through the file and I believe that the uses of unify_pat_types and unify_exp_types are consistent with my expectations: either the two types have inference variables at the current level (typically by calling instance), or the two types have variables at the generic level (typically by calling newgenvar () on one side and generic_instance on the other).

I would say that the patch as proposed is good to merge.

garrigue · 2020-07-24T14:09:38Z

Why is there no generic instance taken in the Ppat_lazy case?

Seems I missed it. Or more precisely, it was missing in the original code.

Note: generic_instance could also be used in the build_as_type definition.

Actually, no. It can only be used if the variables of the original type are not at generic_level, and build_as_type requires them to be at generic_level. If we use generic_instance we lose the sharing between (instance ty) and ty.

gasche · 2020-07-24T14:50:01Z

This seems good to go when the CI is green. (You should probably add @trefis and myself as reviewers.)

trefis · 2020-07-24T14:51:42Z

I haven't had time to give this a proper review yet. I'd be happy to do so, but it will probably have to wait until Monday.

garrigue · 2020-07-25T05:36:43Z

I haven't had time to give this a proper review yet. I'd be happy to do so, but it will probably have to wait until Monday.

I'll wait for that. No hurry.

trefis

LGTM.

trefis · 2020-07-27T14:06:34Z

stdlib/Compflags

  stdlib__scanf.cmx) echo ' -inline 9';;
  *Labels.cm[ox]) echo ' -nolabels -no-alias-deps';;
  stdlib__float.cm[ox]) echo ' -nolabels -no-alias-deps';;
+  stdlib__oo.cmi) echo ' -no-principal';;


I'm slightly surprised by this.

If you remove it, the type of Oo.copy loses abbreviations (i.e. applying it to an object returns a non-abbreviated type).
I admit that this tuning is disturbing, but losing the abbreviation because of a change in the standard library seems a nuisance.

It would be nice to leave a comment here so that we don't remove (or consider removing) the flag in the future because we forgot the reason for having it.

(Is there a different type signature that one could use to enforce sharing even in -principal mode?)

Unfortunately, I see no way to enforce the sharing of an alias inside a signature in principal mode. The unsharing of aliases is part of the design.
There may be a way by using a different way to introduce sharing, but it is going to be subtle. Even something like

type 'a identity = 'a -> 'a val copy : < .. > identity

mysteriously doesn't seem to work.

* Fix ocaml#9759: Typing without -principal is broken in 4.11 and trunk * compile stdlib in -principal mode * never modify generic part of ty_expected_explained * use generic_instance where possible * add comment for -no-principal in stdlib__oo.cmi

garrigue mentioned this pull request Jul 23, 2020

Fix #9759: Typing without -principal is broken in 4.11 and trunk #9765

Closed

trefis reviewed Jul 23, 2020

View reviewed changes

garrigue added 5 commits July 24, 2020 18:21

Fix ocaml#9759: Typing without -principal is broken in 4.11 and trunk

31154bb

add missing instantiation

283c648

compile stdlib in -principal mode

82943d3

never modify generic part of ty_expected_explained

5c601d4

remove propagation of expected type from previous branches

40cd089

garrigue added 3 commits July 24, 2020 19:26

warning output format changed

1ed8e92

use generic_instance where possible

f5bebb8

Revert "remove propagation of expected type from previous branches"

bcf80c2

This reverts commit 40cd089.

garrigue force-pushed the righteous_ambivalence branch from ba43227 to bcf80c2 Compare July 24, 2020 10:50

move log to 4.12 section

ebfedd6

gasche approved these changes Jul 24, 2020

View reviewed changes

add forgotten generic_instance for Ppat_lazy

377c8ed

add reviewers

5f00e68

trefis approved these changes Jul 27, 2020

View reviewed changes

add comment for -no-principal in stdlib__oo.cmi

2d35ea7

garrigue merged commit 302d735 into ocaml:trunk Jul 29, 2020

garrigue mentioned this pull request Jul 29, 2020

Remove propagation from previous branches #9811

Merged

trefis mentioned this pull request Oct 20, 2020

typecore: one more use of generic_instance #9982

Closed

Conversation

garrigue commented Jul 15, 2020

Uh oh!

garrigue commented Jul 15, 2020

Uh oh!

garrigue commented Jul 15, 2020

Uh oh!

gasche commented Jul 21, 2020

Uh oh!

gasche commented Jul 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lpw25 commented Jul 21, 2020

Uh oh!

garrigue commented Jul 22, 2020

Uh oh!

garrigue commented Jul 23, 2020

Uh oh!

lpw25 commented Jul 23, 2020

Uh oh!

gasche commented Jul 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trefis Jul 23, 2020

Choose a reason for hiding this comment

Uh oh!

garrigue commented Jul 24, 2020

Uh oh!

gasche commented Jul 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trefis commented Jul 24, 2020

Uh oh!

gasche commented Jul 24, 2020

Uh oh!

garrigue commented Jul 24, 2020

Uh oh!

gasche commented Jul 24, 2020

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

garrigue commented Jul 24, 2020

Uh oh!

gasche commented Jul 24, 2020

Uh oh!

trefis commented Jul 24, 2020

Uh oh!

garrigue commented Jul 25, 2020

Uh oh!

trefis left a comment

Choose a reason for hiding this comment

Uh oh!

trefis Jul 27, 2020

Choose a reason for hiding this comment

Uh oh!

garrigue Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

gasche Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

garrigue Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gasche commented Jul 21, 2020 •

edited

Loading

gasche commented Jul 23, 2020 •

edited

Loading

gasche commented Jul 24, 2020 •

edited

Loading