Skip to content

Remove arity-interrupting elaboration of module unpacks#12117

Merged
gasche merged 11 commits intoocaml:trunkfrom
ncik-roberts:remove-typedtree-unpack-elaboration
Apr 5, 2023
Merged

Remove arity-interrupting elaboration of module unpacks#12117
gasche merged 11 commits intoocaml:trunkfrom
ncik-roberts:remove-typedtree-unpack-elaboration

Conversation

@ncik-roberts
Copy link
Copy Markdown
Contributor

Remove the elaboration of module patterns, where typechecking would insert let-module expressions for each unpack. For example, this source program:

fun (module M : S) y -> M.x + y

would be elaborated to this typedtree AST:

fun (m : (module S)) ->
  let module M = (val m) in
  fun y -> M.x + y

That is, the only way that modules could be unpacked in typedtree was via Texp_letmodule. Now, we perform no such elaboration, and instead treat the Tpat_var pattern as an unpack when it has pat_extra = Tpat_unpack.

It would be best to review this PR commit-by-commit. The main commit that removes the elaboration is 78d59d6, but this makes available a cleanup in a later commit.

The ideas are the same as a Jane Street version of this patch (ocaml-flambda/ocaml-jst#146 and ocaml-flambda/ocaml-jst#150), if you would like to refer to the discussion there when reviewing this.

Rationale

The old elaboration interrupted the arity of functions, as seen in the example above. Removing this elaboration allows us to simplify the code that tried to recover the arity later. (See 64907dd.) This is in service of a PR I'm working on to make function arity syntactic; see ocaml/RFCs#32. But, even without the larger PR, I think this patch is a simplification.

Though the old elaboration was mainly problematic for functions and not so much for match/let, I remove the elaboration everywhere for consistency.

Approach

To type-check in the new scheme, we:

  • Treat module patterns as introducing a module identifier at an elevated scope
  • Take care to check that the module's types don't escape via the overall expression type (or the bound variable types, in the case of recursive let definitions)

The mechanism we use for this is similar to gadt_equations_level, in that we wrap the type-checking of patterns and their extent in its own level, and store the elevated scope in a global ref that pattern type-checking refers to. (In particular, we repurpose the allow_modules bool to something like int option, where the int payload communicates the scope to enter the module identifier at.)

It's important to think about not only how this PR avoids scope escape, but also how future constructs involving patterns will avoid scope escape. Previously, the way this was accomplished was via the type_unpacks function, which type-checked a body expression within the scope of a list of unpacked modules, and confirmed the modules' types didn't escape via the body's type. In this world, it was easy to see that scope escape was checked anywhere a module was used. Now, the responsibility is more diffuse. But we can still state the matter simply. The responsibility to check scope escape falls to whoever creates the scope such that pattern type-checking sets allow_modules = Modules_allowed { scope }. You can see this responsibility in action in both the type_cases and type_let functions. This is summarized in a comment by the module_patterns_restriction type.

@goldfirere
Copy link
Copy Markdown
Contributor

Given that I reviewed the internal patch at ocaml-flambda/ocaml-jst#146, I'm happy to review this one, too. I should get a chance to do so Monday (meant to today, but time flies). The key piece I would want from others here is feedback about the overall design: is this a movement in the right direction, and is the general approach the right one? I'm happy with it, but others here have much more context than I do. Thanks!

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 18, 2023

This is for @garrigue : could you comment on whether you find the overall design acceptable?

@goldfirere
Copy link
Copy Markdown
Contributor

@garrigue, does this refactoring work for you? It removes the type_unpacks function, in favor of adding module variables on the fly. No need for a detailed correctness review -- I'm happy to do that -- but it would be great to get your opinion about the high-level direction.

@garrigue
Copy link
Copy Markdown
Contributor

garrigue commented Mar 28, 2023

I was going to look into that today.
My main concern is that it adds a new construct to the language, rather than just using syntactic sugar. So there is a need to check (and maintain) the soundness independently.
In particular, the scope of the added modules changes: one common scope by pattern (like for GADTs) rather than one scope by module.
This looks correct but I wanted to assess the extra cost of maintaining the soundness in the future.

Copy link
Copy Markdown
Contributor

@garrigue garrigue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK to me, but I'm still wondering how this interacts with principality checks.
Also it seems clear that the concept of full-fledge module pattern variable introduces extra expressive power and complexity, so the review cannot be only about conservativeness.

let allow_modules = ref false
let module_variables = ref ([] : module_variable list)
let allow_modules = ref Modules_rejected
let module_variables = ref ([] : to_unpack list)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer to avoid using global variables for flags that are propagated top-down, but this can be done later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, getting rid of this global state sounds appealing. I've started another patch to do this. I agree we shouldn't block this PR on this.

...] We learn the signature [S] from the type of [m] in the RHS of the
second let, and we need that knowledge for [type_module] to succeed. If
we type-checked expressions before patterns, then we could call
[add_module_variables] here.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow this discussion. In your example, the type of [m] is known by the time you typecheck the second let. Are talking about recursive let's ?

Copy link
Copy Markdown
Contributor Author

@ncik-roberts ncik-roberts Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the comment is meant to describe a case that isn't limited to recursive lets.

It's true that the type of m is known when checking the second let. But the type-checker needs to first propagate the fact of m's type to the (module M) pattern before the pattern can successfully be checked. That propagation is done by this line:

type_expect exp_env sexp (mk_expected pat.pat_type))
. So, the type-checker only succeeds in checking the pattern (and therefore in adding module variables) after that point.

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 29, 2023

I haven't looked at the details of this work, but here is an opinion in case it can be of help:

  • In the past I encountered the place of parmatch.ml which reverse-engineers the translation of module unpack patterns. I would be very happy to see this ugly hack go away.
  • In general I agree that the simplification is nice.
  • We do need @garrigue to review the new type-checking code for correctness. My impression is that the result will be nicer than the current one, and may help get a more regular / better-structured implementation if we later add other things that need to be scoped inside patterns (we already do with named existential type variables in GADT patterns, and we might eventually rework those as well).
  • But I'm not fond of the design where the Tpat_unpack constructor, which now has an important role and is not merely a decoration anymore, remains in the pat_extra part of the typedtree. I think it would be better to move as a real typedtree constructor. (Unfortunately that only adds weight to the patch, given that currently no changes are required on the typedtree.)

@ncik-roberts
Copy link
Copy Markdown
Contributor Author

ncik-roberts commented Mar 29, 2023

The reason that I did not promote the Tpat_unpack constructor to a constructor in pattern_desc, at the same level as Tpat_var, was to avoid the need to scour the repo for matches on Tpat_var. Tpat_var and Tpat_unpack are handled similarly in many cases — e.g. alpha-conversion of patterns necessary in or-patterns.

In fact, it's a common to see a match on just Tpat_var and Tpat_any, as these are catch-all patterns. If we promoted Tpat_unpack, current and future code would need to instead match on Tpat_var, Tpat_any, and Tpat_unpack.

My original approach did involve promoting Tpat_unpack. At first, I missed adding a case to some matches, which introduced bugs, and so I decided instead to keep Tpat_unpack in pat_extra to avoid this kind of bug altogether.

I'm open to promoting Tpat_unpack to a constructor in pattern_desc, but I wanted to point out there are some drawbacks.

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 29, 2023

The sense I get from your informative reply (thanks!) is that we should consider the PR in its current state, without widening the scope to make Tpat_unpack non-extra, and then discuss doing that as a follow-up PR to be discussed and reviewed separately.

Maybe instead of making Tpat_unpack its own constructor we would want to annotate Tpat_var on whether it binds a term or module-level variable, which would also be very invasive (probably even more) but less fragile.

@ncik-roberts
Copy link
Copy Markdown
Contributor Author

Yes, I think your assessment matches my feeling that we should review this PR as-is without making Tpat_unpack non-extra.

Some of these tests' error messages change in this PR. Some of them
failed at various points when working on this PR.
There are two pieces of now-dead code we can remove:

  * A way of suppressing unused module warnings in `when` guards
    when the modules were introduced via module unpackings
  * A way of ensuring that first-class module unpacks are pushed
    under optional defaulting
This is a refactor to avoid the need to wrap typechecking the
extent of a module unpack in the `type_unpacks` function.
Now you call `add_module_variables` to add the (typechecked)
module bindings to the environment before calling `type_expect`
on the extent; this is more similar to how we add variable bindings
to the environment.
This makes the similarity between `pattern_variable` and
unpacked module variables more apparent. The code is already
parallel; this change makes the names more parallel, too.
Stick closer to the old behavior by checking type escape
via `E1` in `let P = E1 in E2`. This patch previously just
checked via the variables bound in `P`. It's closer to
the old implementation to check `E1`.
@ncik-roberts ncik-roberts force-pushed the remove-typedtree-unpack-elaboration branch from ac170fc to 6f54177 Compare March 30, 2023 18:43
@goldfirere
Copy link
Copy Markdown
Contributor

@gasche

  • We do need @garrigue to review the new type-checking code for correctness. My impression is that the result will be nicer than the current one, and may help get a more regular / better-structured implementation if we later add other things that need to be scoped inside patterns (we already do with named existential type variables in GADT patterns, and we might eventually rework those as well).

It's great if @garrigue can review this (as he already has!). I was just trying to reduce the review burden, but I agree that @garrigue has a much better perspective on this than I do.

unify_exp new_env body (newvar ());
if rec_flag = Recursive then
List.iter
(fun vb -> unify_exp new_env vb.vb_expr (newvar ()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problem with unifying the type of the let-bound expressions is that, since generalization was done in place, it is going to also lose the generality of types that were added to the environment. This is ok since this environment is no longer used for typing, but this may make using it for other things later on more complex. I suppose that the correct thing to do would be to first call generalize_structure, to ensure that nodes whose descendant is generalized are themselves generalized, and then call unify_exp_types on the result of Ctype.instance.
Note that the body is not generalized at this, point, so there is no need to do that for it (and in that case it is important that the unification be on the actual type, as it may trigger expansion).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, thanks. I am going to double-check a detail of my updated approach with my London colleagues before pushing the updated implementation for your review.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the approach you suggested and it seems like it works (now pushed) [1]. But, taking a step back, the code and reasoning behind it is rather complex for a feature that probably is not widely used, namely recursive first-class module unpacks:

let rec (module M) = e1 in e2

I agree that your suggestion is better than the approach I originally took. But, in discussion with @lpw25 , we wondered if it would be better to instead disallow recursive unpacks. This lowers the maintenance burden of this PR (we get to delete [1]). Leo also points out that it's a bit odd that unpacks are allowed as P in let rec P = ... when basically nothing else is.

I think we should probably not block this PR on the decision of whether to remove recursive unpacks. Depending on what you think, we could explore making another PR to do this.

[1]

          if rec_flag = Recursive && may_contain_modules then begin
            List.iter
              (fun vb ->
                 (* [type_let] already generalized bound expressions' types
                    in-place. We first take an instance before checking scope
                    escape at the outer level to avoid losing generality of
                    types added to [new_env].
                 *)
                let bound_exp = vb.vb_expr in
                generalize_structure_exp bound_exp;
                let bound_exp_type = Ctype.instance bound_exp.exp_type in
                let loc = proper_exp_loc bound_exp in
                let outer_var = newvar2 outer_level in
                unify_exp_types loc new_env bound_exp_type outer_var)
              pat_exp_list
          end;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think unpacks in let rec patterns are currently allowed:

# let rec (module M) = (module struct end);;
Error: Modules are not allowed in this pattern.

I get the same result in both 4.14 and 5.0 (haven't tried trunk, but I don't remember any change related to that). I think we'd all rather keep them forbidden, as let rec is already complicated enough as it is.

Copy link
Copy Markdown
Contributor

@lpw25 lpw25 Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're allowed for let rec ... in ...:

# let rec (module M : S) = (module struct end) in ();;
- : unit = ()

but I really think they shouldn't be.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that maintenance is a burden, and that I don't see much practical use for them.
One can always use the old approach expanded as syntactic sugar anyway.

However, if we do not allow them we need a good rationale for it. Otherwise this is yet another irregularity in the language.
Since I don't see one myself (the only restrictions on recursion are supposed to be about initialization, and this is detected independently), I would tend to keep the support for the sake of regularity. The point is that the feature is not "recursive module unpack" but "module pattern", which is orthogonal to recursion.

BTW, there is a reasonably good rationale for disallowing module patterns in toplevel let definitions, but this still confuses people.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the only restrictions on recursion are supposed to be about initialization, and this is detected independently)

No, there is also a restriction that the only allowed patterns for recursive bindings are variables. That's why I was surprised that module patterns were allowed.

And the feature in its current state is very limited: if you have to insert a coercion between the actual structure and its declared signature, then all definitions that are actually recursive are forbidden (I suspect that the coercion makes Rec_check classify the definition as Dynamic instead of Static, but I haven't tried to print the intermediate typedtree to see why). So you don't have access to more features than just unpacking after the definition:

(* This is allowed, including at toplevel: *)
let rec m =
  let module M = (val m : S) in
  (module struct
    let x = 0
    let y () =
      M.x
  end: S)
module M : S' = (val m : S)

I'll ping @gasche, since he may have some more insights to share.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed the case that we currently only allow variables as the pattern of a let rec. I think that the fact that module patterns were accepted before is an unintended side-effect of their desugaring implementation. I see no problem with rejecting them for this reason, in fact this is arguably the more regular thing to do.

(Note that if we had a Tpat_unpack construct as a proper pattern-former, then they would be rejected naturally by the Illegal_letrec_pat check.)

@garrigue
Copy link
Copy Markdown
Contributor

The approach looks correct to me. While the previous approach had advantages from the point of view of soundness, it created difficulties in the match compiler, so I agree with need for something more integrated. Also the code mimics enough the previous approach that we can be confident of its correctness.
I just added a few comments about things that do not seems outright incorrect, but may cause problems in the future.

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 31, 2023

It's great if @garrigue can review this (as he already has!). I was just trying to reduce the review burden, but I agree that @garrigue has a much better perspective on this than I do.

@goldfirere: I realize that my comment may be interpreted as a vote-of-no-confidence on your own offer to review the PR. It certainly wasn't intending to suggest that. There are parts of the codebase that are clearly the turf of someone in particular, and usually I try as part of the review process to get their opinion unless it is a really minor change. For the backend or runtime system, the locals like to give their opinion on the high-level design in particular. For the type-system, I like to ask @garrigue's opinion about the type-system change, but (according to my understanding) Jacques generally prefers other people to review for code quality, clarity, etc., so he is generally not the "main" reviewer. So quite often we have someone to say "I reviewed the code and I think the quality is good -- but the details are always a bit tricky so I'm not 100% confident that the type-checking changes are correct", and then Jacques to say something about the type-checking changes themselves (which doesn't give us 100% confidence, but at least we followed due process).

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 4, 2023

@garrigue I don't know why module patterns are not allowed for toplevel lets. Is the explanation for this restriction written down somewhere, can you point to it?

@garrigue
Copy link
Copy Markdown
Contributor

garrigue commented Apr 4, 2023

I'm not sure the fact module patterns are not allowed for top-level lets is documented. It would mean that a value binding could define new modules, which requires modifying a number of places. This is not impossible to do, but whether this is desirable is open to debate.
My point is that even in this relatively clear cut case, many people think like you that the natural behavior would be to allow them in top-level lets, even if this requires more work.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 4, 2023

It is documented, but without a justification.

The pattern @'(' 'module' module-name ':' package-type ')'@ matches a
package with type @package-type@ and binds it to @module-name@.
It is not allowed in toplevel let bindings.

Your point, if I understand correctly, is that toplevel let definition introduce names that are included/exported in the output structure, so they require slightly more work than local definitions.

@ncik-roberts
Copy link
Copy Markdown
Contributor Author

Thanks @garrigue for your review. It's my impression that I've responded to your comments, and that the remaining points of discussion (module patterns in let rec; top-level module patterns; your code comment about global variables) could be addressed in separate PRs, depending on what conclusion we arrive to. Let me know if you're waiting on me for anything else for this PR.

(I know you clicked "Approve", but I've pushed some changes according to your comments since then, so I wanted to check.)

@garrigue
Copy link
Copy Markdown
Contributor

garrigue commented Apr 5, 2023

I checked your last commit a0450f9 and it looks fine.

Concerning module unpacks in recursive definitions, @gasche 's point sounds valid.
It is not completely true that allowing them was unintended in the first place, as some code was added to allow it, but this addition was just a low hanging fruit with not much thought in it, and I would be very surprised if anybody is using it.

It might be worth thinking about allowing module unpacks in toplevel lets too, but this is another story.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 5, 2023

This has been previously reviewed by @goldfirere, and now approved by @garrigue as well. I believe that this is good to merge. I am planning to "squash" given that there is no clear candidate for separate commits here -- the change in checking abstract type escape stands on its own, but squashing it with the rest actually reduces and simplifies the diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants