Skip to content

Reimplement let open, let module and let exception in terms of a single construct#13839

Merged
nojb merged 14 commits intoocaml:trunkfrom
nojb:let_str_item-0
Apr 30, 2025
Merged

Reimplement let open, let module and let exception in terms of a single construct#13839
nojb merged 14 commits intoocaml:trunkfrom
nojb:let_str_item-0

Conversation

@nojb
Copy link
Copy Markdown
Contributor

@nojb nojb commented Feb 28, 2025

This PR is extracted from #13835 for easier reviewing. The synopsis below is adapted from the one in that PR.

It consists in a pure refactoring that reimplements the three constructs let module, let open and let exception in terms of a single general construct. In doing so, the typechecking and code-generation logic for these constructs is fully shared with the one used for ordinary structure items.

The PR consists of, then, three separate pieces:

  • A new kind of expression is introduced in the abstract syntax: let structure-item in expr (AST node: Pexp_stritem/Texp_stritem). This generalizes the existing let open, let exception and let module. Typechecking and code-generation of this new construct is shared with that used for module-level structure items. This part consists of the first four commits, up to and including Add Pexp_stritem, Texp_stritem. (The first commit should be read modulo whitespace.)

  • Reimplement compiler support for let open, let exception and let module in terms of this one construct. These commits only touch the parser to make it use the new AST node, but indirectly they are switching the typechecking and code-generation logic for these three constructs to the one shared with all structure items. It consists of the three commits named Reimplement ....

  • Finally, we remove all the (now dead) code related to the implementation of the ad-hoc let module, let open and let exception. This is the last commit: Remove reimplemented ....

What needs special attention

  • The changes to typecore.ml dealing with the typechecking for the new construct let structure-item in expr. There is some logic to make sure that the type of expr makes sense outside of the scope of the structure-item. However, I do not have a good understanding of it, so an expert will need to take a look and explain it to me :)
  • The changes to translcore.ml dealing with the code-generation for the new construct. I am fairly confident about this part, but I would appreciate an expert pair of eyes here as well.
  • One test shows that the "shape indexer" seems to detect more stuff than before (namely, local modules). This looks like a bug fix to me, but confirmation would be nice (cc @voodoos).

Potential downsides

This PR will involve a fair amount of churn on the ppxlib side and/or compiler-libs users, but it should be fairly manageable, I think (especially because these local forms are infrequently used).

@nojb nojb added refactoring parsetree-change Track changes to the parsetree for that affects ppxs labels Feb 28, 2025
@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 28, 2025

A very bike-sheddy comment, apologies: I find stritem a bit hard to read (str can be many things in general, typically string rather than structure) and I would have preferred struct_item instead -- as you are naturally doing for the new functions that you introduce. I realize that the naming convention for Texp_* constructs is biased towards unreadable collations already (Texp_setinstvar), but I would still have a preference for Texp_struct_item over Texp_stritem. Do you have a feeling about this?

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Feb 28, 2025

Do you have a feeling about this?

No, I think I am becoming more and more desensitized to naming issues as I get older :)

Renaming to Pexp_struct_item/Texp_struct_item seems perfectly fine to me. I will wait for a little bit before doing the renaming to avoid too much back and forth in case other opinions come up.

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 28, 2025

I have looked at the PR, I think that this is a good idea overall, and I agree with what you wrote in the "Special attention" comments: the key changes to review are the [TP]exp_struct_item cases in typecore.ml and translcore.ml. One can review the new definitions for correctness, but we also need to compare with the special-cased definitions that get removed in the last commit (typing and code-production for each of the subsumed construct) to see if they somehow implemented special simplifications that were missing in the general structure-item case and may matter in practice.

@voodoos
Copy link
Copy Markdown
Contributor

voodoos commented Feb 28, 2025

One test shows that the "shape indexer" seems to detect more stuff than before (namely, local modules). This looks like a bug fix to me, but confirmation would be nice (cc @voodoos).

It is not really a bug fix since the indexer primarily aims at getting items that might be exposed via the module's interface. However there is no strict enforcement of that rule right now and we don't need to prevent the additional items to be indexed. (In fact, for renaming, we will probably need to include non-exposed items in the index too.)

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Feb 28, 2025

One test shows that the "shape indexer" seems to detect more stuff than before (namely, local modules). This looks like a bug fix to me, but confirmation would be nice (cc @voodoos).

It is not really a bug fix since the indexer primarily aims at getting items that might be exposed via the module's interface. However there is no strict enforcement of that rule right now and we don't need to prevent the additional items to be indexed. (In fact, for renaming, we will probably need to include non-exposed items in the index too.)

OK, thanks for the explanation.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Mar 4, 2025

@lthls: would you be interested in reviewing the changes to translcore.ml and translmod.ml?

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Mar 4, 2025

Eventually, yes. I've got a few other things piled up on my todo list at the moment, but when I have a bit more time I will look at this.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Mar 4, 2025

Eventually, yes. I've got a few other things piled up on my todo list at the moment, but when I have a bit more time I will look at this.

Thanks!

Copy link
Copy Markdown
Contributor

@voodoos voodoos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at the parser and typer changes (not in lambda's translcore) and found no obvious issues. It's seducing to be able to share code from Typemod to type these expressions, but I am not expert enough to attest that the behavior is exactly the same, especially for the let module ... in case.

(** [let module M = ME in E] *)
| Pexp_letexception of extension_constructor * expression
(** [let exception C in E] *)
| Pexp_stritem of structure_item * expression
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to document the new constructor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do.

| _ ->
assert false
end
| Pexp_letmodule(name, smodl, sbody) ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes the typing of let module ... in is delegated to Typemod.type_str_item as any other Pstr_module. These two implementations are slightly different and this might require a more careful review.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the part about local level

| Pexp_letmodule(name, smodl, sbody) ->
let lv = get_current_level () in
let (id, pres, modl, _, body) =
with_local_level_generalize begin fun () ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, this level increase is not done in the corresponding section in Typemod.type_str_item.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment at the call of with_local_level_generalize_if in the Pexp_let case I think that it is problematic to remove the with_local_level_generalize here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in question:

ocaml/typing/typecore.ml

Lines 3823 to 3827 in 8f6365f

(* If the patterns contain module unpacks, there is a possibility that
the types of the let body or bound expressions mention types
introduced by those unpacks. The below code checks for scope escape
via both of these pathways (body, bound expressions).
*)

refers to a situation involving module unpacks in patterns (ie let (module P) = ... in ...). No such forms are allowed by the new construct let <stritem> in <expr>, so it is not clear to me that with_local_level_generalize is needed here (nothing is being generalized here as far as I can see).

Does this match your understanding @samsa1?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two things that looks problematic for me here :

  • we can have a module unpack : let module M = (val m) in ...,
    which is expected to work like let (module M) = m in ...

    • when typing if c then (let module M = ... in ...) else e2 than the level used for typing (let module M = ... in ...) is not the same a the one for e2 because the typing of Pstr_module increased the level in a destructive way.

My undestanding is that adding a

let ty = newvar() in
with_local_level_generalize_if cond begin fun () ->
      ...
end ~before_generalize:(fun (new_env, e) -> unify new_env ty e.exp_type)

here can only prevent a problem from happening here.

This comment was written by @ncik-roberts maybe him or @garrigue could help us here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @samsa1 for the comment. I am happy to add that if I can obtain an actual example of things going "wrong". However, I have not been able to so far. Are you able to find one?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsam1 I'm not sure about the need_scope_protection business (first of all, is it need or needs?). Is there any downside to unconditionally using a fresh inference region for all instances of this construct?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojb, @samsa1: I still have the impression that we should unconditionally create an inference region here, so maybe we should discuss whether we want to tweak it before merging.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not reacting earlier. The conditional is indeed not necessary and here to mimic previous behavior. I'm totally fine to change it to creating an inference region unconditionally.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojb can you change this in your patch? Then I would be happy to approve it. (We haven't had a codegeneration-specific review, but oh well.) If you don't feel confident tweaking it I can look at it and propose a modified patch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I will rebase the patch (+ this change) shortly.

Comment on lines -4561 to -4639
~before_generalize: begin fun (_id, _pres, _modl, new_env, body) ->
(* Ensure that local definitions do not leak. *)
(* required for implicit unpack *)
enforce_current_level new_env body.exp_type
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is it safe to remove this ? (I guess so since the related test passes)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those lines are not exactly removed. The

   let tv = newvar () in
   ...
   unify_var new_env tv exp.exp_type

In the Pexp_stritem case does the same thing

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Mar 27, 2025

Thanks @voodoos for the review (and apologies for the lag in responding!). Indeed, it is not obvious that the new typechecking code is equivalent to the old one, but the testsuite passes, and morally I see no reason for the typechecking for local structure items to differ from that of module-level structure items.

Incidentally, @samsa1 told me today offline that in fact the refactoring here seems to fix a bug in the typechecking of let exception.

@samsa1: would you be interested in doing a review of the typechecking part of the patch?

@nojb nojb force-pushed the let_str_item-0 branch 2 times, most recently from ab87a17 to 7e889ad Compare March 27, 2025 21:02
@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Mar 27, 2025

Do you have a feeling about this?

No, I think I am becoming more and more desensitized to naming issues as I get older :)

Renaming to Pexp_struct_item/Texp_struct_item seems perfectly fine to me. I will wait for a little bit before doing the renaming to avoid too much back and forth in case other opinions come up.

I had to rebase the PR on top of the current trunk so I did the renaming while at it.

Copy link
Copy Markdown
Contributor

@samsa1 samsa1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a whole this PR looks good to me, except a small mismatch in refactoring in typing/typecore.ml that seems problematic.

It also removes a bug in depent.ml and reduces the risk of introducing bugs in other parts of the compiler. Thus, I'm totally in favor of this PR.

This PR says that we refactor 3 cases, however it seems it allows any local structure item (using PPX rather than the parser). Wouln't it be better to modify typing/ast_invariants.ml to prevent such cases from happening ?

I don't consider myself competent for the part of the code in the lambda folder

| Some id -> String.Map.add id b bv
in
add_expr bv e
| Pexp_letexception(_, e) -> add_expr bv e
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bug here that is being corrected by this PR.
Missed add_extension_constructor on the first argument of Pexp_letexception

for modules that are present. *)
let size = classify_module_expression env mexp in
let env = Ident.add mid size env in
classify_expression env e
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it okay to stop updating env in this case ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only forbids more cases, so it can't cause soundness issues.
You have to be quite creative to find an example where this matters:

module type T = sig type t val x : t end
type m = T of (module T with type t = m)
module type Tm = T with type t = m

let f () =
  let rec v : (module Tm) =
    let module M = struct
      type t = m
      let x = T v
    end in
    (module M)
  in
  v

If I'm not mistaken, the code above will be rejected with this PR, but in practice there are likely no such occurrences in the wild.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that becoming more restrictive on let module is probably okay. If we find example of breakage in the wild during release testing, I will volunteer to extend the classification as desired.

Comment on lines -4561 to -4639
~before_generalize: begin fun (_id, _pres, _modl, new_env, body) ->
(* Ensure that local definitions do not leak. *)
(* required for implicit unpack *)
enforce_current_level new_env body.exp_type
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those lines are not exactly removed. The

   let tv = newvar () in
   ...
   unify_var new_env tv exp.exp_type

In the Pexp_stritem case does the same thing

let tv = newvar () in
let (si, newenv) = !type_str_item env si in
let exp = type_expect newenv e ty_expected_explained in
unify_var newenv tv exp.exp_type;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why calling unify_var here and not unify_exp ?
This could lead to slightly better error messages for examples such as :

let module M = struct type t = A end in M.A

Copy link
Copy Markdown
Contributor Author

@nojb nojb Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I took this code from what is done currently for let open ... in .... I tried replacing unify_var newenv tv exp.exp_type with unify_exp ~sexp newenv exp tv, but the error message did not change:

# let module M = struct type t = A end in M.A;;
Error: The constructor M.A has type M.t
       but an expression was expected of type 'a
       The type constructor M.t would escape its scope

(This is also the same error message as in trunk.)

| Pexp_letmodule(name, smodl, sbody) ->
let lv = get_current_level () in
let (id, pres, modl, _, body) =
with_local_level_generalize begin fun () ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment at the call of with_local_level_generalize_if in the Pexp_let case I think that it is problematic to remove the with_local_level_generalize here.

| _ ->
assert false
end
| Pexp_letmodule(name, smodl, sbody) ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the part about local level

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Mar 28, 2025

Note: on the semantics side, we mentioned in yesterday's meeting that most of the features can already be done using let open struct <stritem> end in expr, but for a number of structure items this is not equivalent (typically let open M in expr is not equivalent to let open struct open M end in expr).
However, I believe there is a more accurate way to emulate this that should give people the right intuition about the semantics: let <stritem> in <expr> is equivalent to let module M = struct <stritem> let body = <expr> end in M.body, but of course without the overhead of creating an actual module.
This gives the right intuition for trickier items such as floating module attributes ([@@@warning "-4"] for example).

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Mar 28, 2025

However, I believe there is a more accurate way to emulate this that should give people the right intuition about the semantics: let <stritem> in <expr> is equivalent to let module M = struct <stritem> let body = <expr> end in M.body, but of course without the overhead of creating an actual module.

Indeed, thanks!

@gasche gasche added the typing label Mar 28, 2025
@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Mar 29, 2025

Thanks @samsa1 for the review!

This PR says that we refactor 3 cases, however it seems it allows any local structure item (using PPX rather than the parser). Wouln't it be better to modify typing/ast_invariants.ml to prevent such cases from happening ?

Indeed. In #13835 we expose the possibility of using arbitrary structure items locally in the surface syntax. But you are right that as far as this PR is concerned, this should not be allowed. I will do as suggested.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Mar 29, 2025

This PR says that we refactor 3 cases, however it seems it allows any local structure item (using PPX rather than the parser). Wouln't it be better to modify typing/ast_invariants.ml to prevent such cases from happening ?

Indeed. In #13835 we expose the possibility of using arbitrary structure items locally in the surface syntax. But you are right that as far as this PR is concerned, this should not be allowed. I will do as suggested.

This is now done edb9d8f.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 10, 2025

@lthls would you be able to look at the changes in translmod and translcore introduced by this PR? I haven't looked at them myself, but structurally there is a risk that the previous definitions would generate better code than the general structure-item definitions.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Apr 14, 2025

@Octachron I would like to add this to the 5.4 milestone; is that OK?

@Octachron
Copy link
Copy Markdown
Member

It would be certainly nice to review the PR as soon as possible. However, since this first PR is an internal change, I am not sure how helpful it would be to push for its integration into 5.4?

Nevertheless if you wish, I wouldn't mind adding it to the 5.4 features milestone to extend the review period, but with the understanding that I will close the extended review period in the beginning of May.

@nojb nojb force-pushed the let_str_item-0 branch 2 times, most recently from 5873128 to b3fe41e Compare April 14, 2025 13:57
@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Apr 30, 2025

@gasche @samsa1 I rebased the PR with a fix making the creationg of the type region unconditional. I am planning to merge once the CI passes.

@Octachron we had discussed including this for 5.4, but there is no reason to rush things, so let us leave this on trunk to give us more time to road test it.

@nojb nojb merged commit aaab90d into ocaml:trunk Apr 30, 2025
23 checks passed
@t6s
Copy link
Copy Markdown
Contributor

t6s commented May 12, 2025

I have noticed several occurrences of Pexp_letmodule remaining in comments:

~/ocaml/trunk$ grep -Hnr exp_letmodule
typing/typecore.ml:2283:      (* This code is parallel to the typing of Pexp_letmodule. However we
typing/typecore.ml:2285:         For Pexp_letmodule, the call to [type_module] is done in a raised
typing/typemod.ml:3075:  (* Same as Pexp_letmodule *)
tools/eqparsetree.ml:748:  | (Pexp_letmodule (a0, a1, a2), Pexp_letmodule (b0, b1, b2)) ->
~/ocaml/trunk$ grep -Hnr exp_letexception
~/ocaml/trunk$ grep -Hnr exp_open
tools/eqparsetree.ml:760:  | (Pexp_open (a0, a1), Pexp_open (b0, b1)) ->

eqparsetree.ml looks outdated, but this would be a different issue.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented May 12, 2025

Thanks @t6s! I will open a PR with the fix for this documentation issue.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented May 12, 2025

Thanks @t6s! I will open a PR with the fix for this documentation issue.

See #14028

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 16, 2026

( #14554 is a regression on let module typing that is caused by the present PR, and is under investigation. )

| Pexp_sequence (_, e) | Pexp_open (_, e) -> is_inferred e
| Pexp_sequence (_, e) -> is_inferred e
| Pexp_ifthenelse (_, e1, Some e2) -> is_inferred e1 && is_inferred e2
| _ -> false
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduced a regression, as we should have replaced the old Pexp_open by a corresponding case for Pexp_struct_item. See #14638.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parsetree-change Track changes to the parsetree for that affects ppxs refactoring typing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants