Skip to content

Introduce wrapper functions for level management#11536

Merged
gasche merged 23 commits intoocaml:trunkfrom
COCTI:wrap_def
Jan 14, 2023
Merged

Introduce wrapper functions for level management#11536
gasche merged 23 commits intoocaml:trunkfrom
COCTI:wrap_def

Conversation

@t6s
Copy link
Copy Markdown
Contributor

@t6s t6s commented Sep 6, 2022

This PR introduces wrapper functions for level management (Ctype.wrap_def, etc)
and for type variable scoping (Typetexp.wrap_type_variable_scope).
The older API (Ctype.{begin_def,end_def}, Typetexp.{narrow,widen}, etc.) is removed.

The goal is to clarify the structure of generalization inside the type inference algorithm
so as to be able to introduce more abstraction later.

  • The scope of some level management is refined (in particular for type_application).
  • The body of Pexp_send is separated into type_send.
  • The detection of unused let-declarations is separated into a wrapper.
  • type_unpacks is changed to use one recursion rather than two List.fold_lefts.

@bluddy
Copy link
Copy Markdown
Contributor

bluddy commented Sep 6, 2022

Very nice! Those functions always bothered me.

@stedolan
Copy link
Copy Markdown
Contributor

While refactoring, could we also rename wrap_def to e.g. with_level? The word "def" always confused me: it is used only in the names of the functions begin_def / end_def / init_def / init_class_def for what the rest of the code calls a "level".

(I suspect there was a time when begin_def / end_def were only used around "definitions" (let bindings), but that is absolutely not the case today)

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Dec 13, 2022

It's perhaps worth discussing here a slightly larger change that I think would be a huge improvement: store the current level as part of the environment. Then instead of with_ operations you get operations that operate on environments. The current level is semantically part of the environment: levels represent pointers into the inferred type environment and the current level is the pointer to the head of the environment.

@garrigue
Copy link
Copy Markdown
Contributor

@stedolan
I see your point, but with_level is not correct either, since we are talking about raising the level. Maybe with_local_level ? I didn't check that all the declinations work with that.

@lpw25
I am not sure of the meaning of this statement. I agree that there is a relation between level and environment, i.e. a raised level characterizes things that are not shared with the environment, but I'm not sure that the current level should be part of the environment. For instance, in a let definition, we first type the definition in the original environment just raising the current level, and then type the body at the original level in an extended environment. We would end up with two distinct (non-linear) extensions of the same environment in the same function, which sounds confusing.
I suppose there is a way to reconcile the two views, for instance by having a way of enforcing that types added to the environment do not contain type variables with a level between the level of this environment and generic_level, but this becomes a much bigger change.
Another problem is that the wrapper nicely encapsulate the nesting of scopes, in a kind of monadic way. However, at least currently, we need to manipulate the environment explicitly (sometimes we work with multiple environments simultaneously), so that such encapsulation would be difficult.

@gasche
Copy link
Copy Markdown
Member

gasche commented Dec 14, 2022

I thought about with_new_level or with_next_level.

We would end up with two distinct (non-linear) extensions of the same environment in the same function, which sounds confusing.

I don't find this confusing. In let <def> in <body>, def is checked in a different context than the current one (it has an extra inference level), and body is also checked in a different context (with current level but an extra variable).

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Dec 14, 2022

We would end up with two distinct (non-linear) extensions of the same environment in the same function, which sounds confusing.

Rather than confusing I would describe this as just semantically correct. If you write the declarative typing rules out and don't unaccountably forget to include your type variables in your typing environment then it is clear that the definition and body of a let are indeed typed in different environments:

 Γ; α ⊢ e₁ : τ₁     Γ; x : ∀α.t ⊢ e₂ : τ₂
──────────────────────────────────────────
       Γ ⊢ let x = e₁ in e₂ : τ₂

Another problem is that the wrapper nicely encapsulate the nesting of scopes, in a kind of monadic way. However, at least currently, we need to manipulate the environment explicitly (sometimes we work with multiple environments simultaneously), so that such encapsulation would be difficult.

This is an interesting question. You basically go from something like this, in the current PR:

let ty =
  wrap_def_iter ~post: generalize (fun () ->
    ...
  )
in

to something like:

let ty
  let env = Env.add_level env in
  ...
in
generalize env ty

although you could also do something like:

let ty =
  wrap_new_def env ~post:generalize (fun env ->
    ...
  )
in

with the env-based version too.

Of these I personally prefer the env-based versions because they have a value which explicitly represents scopes rather than using global state.

I don't want to side-track this PR too much, I just wanted to mention this possibility in case you were keen to try it out now.

@gasche
Copy link
Copy Markdown
Member

gasche commented Dec 14, 2022

Personally I think that tracking the level in the environment has a lot of merit, but I think we could do these sort of refactorings incrementally, leaving this suggestion to a later PR. We had our type-system-refactoring meeting in early November, we bikeshed over the API a bit during the meeting, but our consensus conclusion was that we should limit bike-shedding (I only asked to unify the proc and post labels), review for correctness, merge, and iterate with more good stuff from COCTI. My opinion on the general approach has not changed, I just haven't had the time to do the review-for-correctness step.

@garrigue
Copy link
Copy Markdown
Contributor

@lpw25
My opinion is still that, if we are to include the level in the environment, then we need to do more about it. Looking at your typing rule, it is clear that the level does not represent the type variable but its scope. More precisely, all variables in the environment must be lower than that. If we can enforce that, this gets more interesting.

So I would like to think carefully about that.

@garrigue
Copy link
Copy Markdown
Contributor

Actually, there may be a way to use the environment meaningfully.
Since it relies on existential types it requires some experimentation:

type 'a Env.t  (* add a phantom parameter *)
type new_env = Env : 'a Env.t -> new_env
type 'a type_scheme
val raise_level: 'a Env.t -> new_env
val generalize: 'a Env.t -> type_expr -> 'a type_scheme
val add_value: Ident.t -> 'a type_scheme -> 'a Env.t -> 'a Env.t

let ty =
  let Env env = raise_level env in ...
in
let ts = generalize env ty in
let env = add_valud id ts env in
...

This way we really get added safety.

@garrigue
Copy link
Copy Markdown
Contributor

Just adding to this hypothetical design (not sure this is the right place...)

  • type definitions, class definitions, module type definitions are always at generic level, so the types they contain should be of type closed_scheme = generic type_scheme with generic the index for generic_level.
  • module definitions may contain non-generalized variables, so they need an index; it feels like some index subtyping is eventually required.
  • we would need functions for simultaneous generalization/instantiation.
  • if we keep in-place generalization, it would be nice to have a notion of invalid type_expr, i.e. when it contains nodes at generic level; how can we detect that efficiently ?
  • what about uses of init_def, in particular create_scope ?

@garrigue
Copy link
Copy Markdown
Contributor

with_local_level approved in discussion.
with_local_level_for_class
with_raised_nongen_level
....

@t6s
Copy link
Copy Markdown
Contributor Author

t6s commented Dec 16, 2022

I have just finished renaming according to the last meeting.
Now this should be ready to merge.

begin_def ();
let (args, ty_res) = type_application env funct sargs in
end_def ();
unify_var env (newvar()) funct.exp_type;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is going on here? It looks like the behavior is changed.

This seems related to #11519 but the change is different from the one in #11519.

Unless you are very confident that the change is correct, I think we could stick to "just a refactoring" (no change in behavior) in this current PR to make it easier to review, and keep small behavior changes for smaller independent PRs.

Copy link
Copy Markdown
Contributor

@garrigue garrigue Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to scratch my head for a while to understand why this extra level was needed. It is actually there to warn on partial applications, so I moved the with_local_level to the end of type_application, where it is needed.
Note that type_application is a long sequence of functions, called from the wrapped code, so there is no semantical change.
I also removed the call to unify_var as there is no need to lower the levels of funct.exp_type, since it does not appear in the return type. I don't know why I included this line in 2004, probably did not think much about it.

@garrigue
Copy link
Copy Markdown
Contributor

@gasche Thanks for the detailed reviews with many good suggestions. I used most of them, and tried to explain why when I chose a different way.

@gasche
Copy link
Copy Markdown
Member

gasche commented Dec 28, 2022

and tried to explain why when I chose a different way.

Yes, (of course) feel free to disagree with some of the suggestions and explain so.

Thanks! I will try to continue my review -- but probably not much in the next few hours.

generalize_structure ty;
let cty, ty, force =
with_local_level ~post:(fun (_,ty,_) -> generalize_structure ty)
(fun () -> Typetexp.transl_simple_type_delayed !env sty)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a naive question about the usage of Typetexp.transl_simple_type_delayed in typecore.

Many of its occurrences are wrapped in a new level, with generalize_structure called on the ty comment (second element of the output triple). (The only two exceptions are the calls in solve_Ppat_poly_constraint and in the Pexp_coerce case.) But looking at the code of transl_simple_type_delayed, the middle component is the result of calling instance on cty.ctyp_type. What is the point of taking an instance and then re-generalizing right after ?

(Should this common behavior be wrapped in a new function of Typetexp, to simplify the logic in Typecore?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I don't have a full answer yet, but a first remark is that the generalization function used inside transl_simple_type_delayed is not the same as the one used outside. The internal one generalizes everything, but the external one doesn't generalize type variables, so it would be incorrect to just use the type inside ctyp. I assume that the reason to generalize inside transl_simple_type_delayed is to keep a generalized version of the type inside ctyp, but I have not yet found where it is used. The goal might just be to be able to print the annotation independently of types inferred from outside.

~post: begin fun (exp,vars) ->
generalize_and_check_univars env "method" exp ty_expected vars
end
in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is correct, but I think that the code would be easier to read with a higher-order post function as I suggested during one of our earlier review discussions.

            let exp =
              with_local_level @@ fun post ->
              let vars, ty'' =
                with_local_level_if_principal @@ fun post ->
                let vars, ty'' = instance_poly true tl ty' in
                post generalize_structure ty'';
                vars, ty''
              in
              let exp = type_expect env sbody (mk_expected ty'') in
              post (generalize_and_check_univars env "method" exp ty_expected) vars;
              exp

You were unconvinced at the time, and I think that it is more useful to make progress on the present PR without trying to change the API too much, but I still think this would be a nicer approach in the complex cases, and probably fine in the smiple cases. Maybe we can revisit this option later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we are rather thinking of making things more first-order, and exploit typing to enforce invariants. But we have to see what flies.

let ety = Subst.type_expr Subst.identity body.exp_type in
replace ety;
(body, ety)
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find more natural to return { body with exp_type = ety } here, given that the previous body.exp_type is invalid outside the local level/scope.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the suggested change, but get a different output in typing-poly.
Still wondering what could be the cause; i.e. how can rebuilding a record twice rather than once can be observed.

end_def ();
lower_contravariant env arg.exp_type;
begin_def ();
let arg = {arg with exp_type = instance arg.exp_type} in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this instance call useful/necessary? We just type-checked arg at the same level (since the call to type_exp we called end_def followed by begin_def again), and we did not generalize in between, only lowered contravariant variables.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indeed does not seem to be necessary.
This may be a leftover from a previous approach to lower_contravariant that simultaneously generalized the structure of the type.
An alternative (tested) version of the code fixing this would be:

      let arg, _ =
        with_local_level (fun () -> type_exp env sarg, instance ty_arg)
        ~post: begin fun (arg, ty_arg) ->
          lower_contravariant env arg.exp_type;
          unify_exp env arg ty_arg;
          generalize_and_check_univars env "field value" arg label.lbl_arg vars
        end;
      in
      {arg with exp_type = instance arg.exp_type}

This is orthogonal to the current PR, but do you think it should be included?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the code you show above to the one in the PR, and I agree that it should be equivalent. If you want to include it in the PR, please feel free; if you want to submit it as a later PR, that is also okay with me. But I think that we want the change in trunk in the medium term.

@@ -4417,15 +4358,19 @@ and type_label_exp create env loc ty_expected
with exn when maybe_expansive arg -> try
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is very tricky, and I'm not fully satisfied with the proposal in the PR:

  1. it leaves some logic out of the wrapper that clearly belongs to a post, and
  2. it performs the snapshotting in a way that does not respect the wrapper nesting.

Sketch of alternative proposal (untested):

and type_label_exp create env loc ty_expected
          (lid, label, sarg) =
  (* Here also ty_expected may be at generic_level *)
  let separate = !Clflags.principal || Env.has_local_constraints env in
  let (vars, ty_arg) =
    with_local_level begin fun () ->
      let (vars, ty_arg) =
        with_local_level_iter_if separate begin fun () ->
          let (vars, ty_arg, ty_res) =
            with_local_level_iter_if separate ~post:generalize_structure
              begin fun () ->
                let ((_, ty_arg, ty_res) as r) = instance_label true label in
                (r, [ty_arg; ty_res])
              end
          in
          begin try
            unify env (instance ty_res) (instance ty_expected)
          with Unify err ->
            raise (Error(lid.loc, env, Label_mismatch(lid.txt, err)))
          end;
          (* Instantiate so that we can generalize internal nodes *)
          let ty_arg = instance ty_arg in
          ((vars, ty_arg), [ty_arg])
        end
        ~post:generalize_structure
      in

      if label.lbl_private = Private then
        if create then
          raise (Error(loc, env, Private_type ty_expected))
        else
          raise (Error(lid.loc, env, Private_label(lid.txt, ty_expected)));

      let arg = if vars = [] then Either.Left 

      (vars, ty_arg)
    end
  in
  let arg =
    let exp_instance arg =
      { arg with exp_type = instance arg.exp_type } in
    let exception Retry of exn in
    try
      let snap = if vars = [] then None else Some (Btype.snapshot ()) in
      exp_instance @@ with_local_level
        begin fun () ->   
          type_argument env sarg ty_arg (instance ty_arg)
        end
        ~post:begin fun arg ->
          if vars <> [] then try
            if maybe_expansive arg then
              lower_contravariant env arg.exp_type;
            generalize_and_check_univars env "field value" arg label.lbl_arg vars;
          with exn when maybe_expansive arg ->
            (* Try to retype without propagating ty_arg, cf PR#4862 *)
            Option.iter Btype.backtrack snap;
            raise (Retry exn)
        end
    with Retry exn -> try
      let arg = with_local_level (fun () -> type_exp env sarg)
          ~post:(fun arg -> lower_contravariant env arg.exp_type)
      in
      exp_instance @@ with_local_level
        begin fun () ->
          let arg = exp_instance arg in
          unify_exp env arg (instance ty_arg);
          arg
        end
        ~post: begin fun arg ->
          generalize_and_check_univars env "field value" arg label.lbl_arg vars
        end
    with Error (_, _, Less_general _) as e -> raise e
    | _ -> raise exn    (* In case of failure return the first error *)
  in
  (lid, label, arg)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too excited by this change just now: while I agree that this makes sense, the code is much changed, and the need to introduce a local exception to be sure that we are not catching an exception coming from type_argument is hairy.
I don't think we really gain in complexity as a result.
It seems that you are trying to fix smells in the original code rather than in this PR, which tries to be minimal.
I agree that the structure of the code is not so nice, but can can we also differ this one for later ?
To answer your points:

  1. what belongs to post is not always easy to determine; this is is a first attempt
  2. since snapshotting is only about unification, it is not directly related to levels (but I agree that your proposal looks better ultimately)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #11900 for a proposal to merge this change now that the current PR is merged.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside of this PR is that the higher-order wrappers do not play nicely with let-binding scopes. In many cases the previous code looked like

begin_def ();
let x = ... in
let y = ... in
let z = .... in
end_def ();
... (* uses x, y, z *)

and the new code now has to look like the more convoluted

let x, y, z =
  with_local_level @@ fun () ->
  let x = ... in
  let y = ... in
  let z = ... in
  x, y, z
in
...

Notice the two repetitions of x, y, z that redundantly expose the environment defined inside the wrapper to the outside.

When several wrappers are nested, this can result in boilerplate-y code. For example:

  let (vars, ty_arg, snap, arg) =
    with_local_level begin fun () ->
      let (vars, ty_arg) =
        with_local_level_iter_if separate begin fun () ->
          let (vars, ty_arg, ty_res) =
            with_local_level_iter_if separate ~post:generalize_structure
              begin fun () ->
               ....

i am not sure how to do better, and I do think that the benefits of the PR outweigh the cost in readability that it incurs.

If we had some type inference in the module language, we could write something like this:

(* declaration *)
module With_local_level {module type S} (F : () -> S) () : S = ..

(* user code *)
let open With_local_module(functor () -> struct
  let x = ...
  let y = ...
)() in

but this proposal is made impractical by the fact that the signature S cannot be inferred, it has to be passed explicitly, introducing a lot of redundancy again.

This could also be attacked at the term level, if we had lightweight extensible records with an open construct, but we currently do not have those.

@garrigue
Copy link
Copy Markdown
Contributor

Concerning the interaction between let-binding and scope, I'm not sure it is a bad thing to have to make things more explicit.
Anything that leaves a scope might require some form of generalization, so making it explicit is actually good.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 11, 2023

Being explicit about the things we take out of the level/scope is good, but right now this involves a lot of redundancy. The "telescope" (the environment exported from within the level to the outside) is duplicated in 2-3 places:

  1. the let binding receiving the result of the wrapper call (let (x, y, z) = ...)
  2. the return value within the wrapper ((x, y, z))
  3. the input arguments of the post function (except in iter style): fun (x, y, z) -> ...

The lists in (1) and (3) sometimes omit some of the bindings, but they still duplicate the arity and this makes changing the number or ordering of exported values tedious. Most of the solutions I considered to avoid this redundancy (none of which work in current OCaml) support being explicit, but without the redundancy.

For example we could consider writing:

let type r = {x: _; y: _; z: _} in
let t =
  with_local_level @@ fun () ->
  let x = ... in
  let y = ... in
  let z = ... in
  {x; y; z}
in
let open t in
...

(If let open t does not exist, instead replace x by t.x in the rest of the expression: less pleasant but still less redundant than today.)

This assumes that we allow local type definitions to leave some of their type components inferred as weak inference variables, which is definitely not supported today. Having to provide explicit type information here would be too painful.

With this approach, we reduce the duplication slightly, as we repeat the telescope at most twice (once in the type declaration, once in the return value; never in the post argument), and reordering or insertion of new values is facilitated. With structural records it would be even nicer, only one occurrence of the telescope.

let t =
  with_local_level @@ fun () ->
  let x = ... in
  let y = ... in
  let z = ... in
  {< x; y; z >}
in
let open t in
...

We still have an explicit list of all exported values, but only in one place.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Jan 11, 2023

I don't think those features would really make the code easier to read. They make the binding structure of the code more complicated, and so it requires more thinking to understand. If you use records and punning:

let { x; y; z } =
  with_local_level @@ fun () ->
  let x = ... in
  let y = ... in
  let z = ... in
  { x; y; z }
in

then there is no genuine repetition here in the sense that a machine is checking that you wrote the same thing in both places so you know there is no need to look deeply at both of them.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 11, 2023

The problem here is that you have to declare the record type in advance, and in particular give the type of the fields, and this is so syntactically heavy that no one is doing it.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Jan 11, 2023

Fair enough, but then I think the language extension you are looking for is just structural record types that support punning. There is no need for some kind of open construct -- which is the part of your suggestion that I was critiquing.

then widen so as to not introduce any new constraints *)
let z = narrow () in
(* narrow and widen are now invoked through wrap_type_variable_scope *)
with_local_type_variable_scope begin fun () ->
Copy link
Copy Markdown
Member

@gasche gasche Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldfirere: you mentioned that you looked at Typetexp.{narrow,widen}; it might be the change to a wrapper combinator proposed in this PR (exemplified in the diff here) conflicts with the refactoring you had in mind. (In that case, I would say submit anyway, we can worry about rebasing once something gets merged.)

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually done going through this very large PR. I left a few minor comments again, but I definitely haven't found any apparent correctness issue. (And I agree that the code is generally nicer, with my reservation about the telescope redundancy.)

I would like to give @garrigue the opportunity to respond to the last batch of remarks/suggestions/questions, but then I think this should be good to merge.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @t6s and @garrigue for this work. It's a very invasive change, but I think that overall it is a nice improvement. I have reviewed the PR and we have discussed every part of the change in details -- if there are any errors left, at least they escaped serious scrutiny. I am approving the PR and I think we should merge.

Comments:

  • I still believe that the wrapper style brought by this PR generates hard-to-read / boilerplate-y code when levels get nested locally, due to "telescope redundancy" ( #11536 (review) ). I don't see any solution to this problem with the current features of OCaml -- anonymous/structural records would be nice to have here.
  • The current post style is an improvement over what we had before, but it is frequently inflexible and irritating. I mentioned using higher-order functions to "delay" calls placed inside the wrapper instead in #11536 (comment) , and @garrigue mentioned different ideas using typing to keep track of levels. I think this is an area for improvement. I'm not planning to work on it mysel, and I don't know if this should take priority over other work in the COCTI project (there sure is a lot to do in the type-checker codebase), but I am curious to see if something can be done.
  • I made a few proposals for code changes during my review that @garrigue preferred to defer to later. I think we should consider revisting them soon -- after this is merged. To help with that, I marked all minor comments "resolved"; the comments that remain visible directly in the discussion are those that I think remain relevant now -- including these code-change suggestions.

Thanks again. This was really a lot of work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants