Remove arity-interrupting elaboration of module unpacks#12117
Remove arity-interrupting elaboration of module unpacks#12117gasche merged 11 commits intoocaml:trunkfrom
Conversation
|
Given that I reviewed the internal patch at ocaml-flambda/ocaml-jst#146, I'm happy to review this one, too. I should get a chance to do so Monday (meant to today, but time flies). The key piece I would want from others here is feedback about the overall design: is this a movement in the right direction, and is the general approach the right one? I'm happy with it, but others here have much more context than I do. Thanks! |
|
This is for @garrigue : could you comment on whether you find the overall design acceptable? |
|
@garrigue, does this refactoring work for you? It removes the |
|
I was going to look into that today. |
garrigue
left a comment
There was a problem hiding this comment.
This looks OK to me, but I'm still wondering how this interacts with principality checks.
Also it seems clear that the concept of full-fledge module pattern variable introduces extra expressive power and complexity, so the review cannot be only about conservativeness.
typing/typecore.ml
Outdated
| let allow_modules = ref false | ||
| let module_variables = ref ([] : module_variable list) | ||
| let allow_modules = ref Modules_rejected | ||
| let module_variables = ref ([] : to_unpack list) |
There was a problem hiding this comment.
It would be nicer to avoid using global variables for flags that are propagated top-down, but this can be done later.
There was a problem hiding this comment.
Yes, getting rid of this global state sounds appealing. I've started another patch to do this. I agree we shouldn't block this PR on this.
| ...] We learn the signature [S] from the type of [m] in the RHS of the | ||
| second let, and we need that knowledge for [type_module] to succeed. If | ||
| we type-checked expressions before patterns, then we could call | ||
| [add_module_variables] here. |
There was a problem hiding this comment.
Not sure I follow this discussion. In your example, the type of [m] is known by the time you typecheck the second let. Are talking about recursive let's ?
There was a problem hiding this comment.
No, the comment is meant to describe a case that isn't limited to recursive lets.
It's true that the type of m is known when checking the second let. But the type-checker needs to first propagate the fact of m's type to the (module M) pattern before the pattern can successfully be checked. That propagation is done by this line:
Line 5258 in 0542762
|
I haven't looked at the details of this work, but here is an opinion in case it can be of help:
|
|
The reason that I did not promote the In fact, it's a common to see a match on just My original approach did involve promoting I'm open to promoting |
|
The sense I get from your informative reply (thanks!) is that we should consider the PR in its current state, without widening the scope to make Tpat_unpack non-extra, and then discuss doing that as a follow-up PR to be discussed and reviewed separately. Maybe instead of making Tpat_unpack its own constructor we would want to annotate Tpat_var on whether it binds a term or module-level variable, which would also be very invasive (probably even more) but less fragile. |
|
Yes, I think your assessment matches my feeling that we should review this PR as-is without making |
Some of these tests' error messages change in this PR. Some of them failed at various points when working on this PR.
There are two pieces of now-dead code we can remove:
* A way of suppressing unused module warnings in `when` guards
when the modules were introduced via module unpackings
* A way of ensuring that first-class module unpacks are pushed
under optional defaulting
This is a refactor to avoid the need to wrap typechecking the extent of a module unpack in the `type_unpacks` function. Now you call `add_module_variables` to add the (typechecked) module bindings to the environment before calling `type_expect` on the extent; this is more similar to how we add variable bindings to the environment.
This makes the similarity between `pattern_variable` and unpacked module variables more apparent. The code is already parallel; this change makes the names more parallel, too.
Stick closer to the old behavior by checking type escape via `E1` in `let P = E1 in E2`. This patch previously just checked via the variables bound in `P`. It's closer to the old implementation to check `E1`.
ac170fc to
6f54177
Compare
It's great if @garrigue can review this (as he already has!). I was just trying to reduce the review burden, but I agree that @garrigue has a much better perspective on this than I do. |
typing/typecore.ml
Outdated
| unify_exp new_env body (newvar ()); | ||
| if rec_flag = Recursive then | ||
| List.iter | ||
| (fun vb -> unify_exp new_env vb.vb_expr (newvar ())) |
There was a problem hiding this comment.
A problem with unifying the type of the let-bound expressions is that, since generalization was done in place, it is going to also lose the generality of types that were added to the environment. This is ok since this environment is no longer used for typing, but this may make using it for other things later on more complex. I suppose that the correct thing to do would be to first call generalize_structure, to ensure that nodes whose descendant is generalized are themselves generalized, and then call unify_exp_types on the result of Ctype.instance.
Note that the body is not generalized at this, point, so there is no need to do that for it (and in that case it is important that the unification be on the actual type, as it may trigger expansion).
There was a problem hiding this comment.
Noted, thanks. I am going to double-check a detail of my updated approach with my London colleagues before pushing the updated implementation for your review.
There was a problem hiding this comment.
I've implemented the approach you suggested and it seems like it works (now pushed) [1]. But, taking a step back, the code and reasoning behind it is rather complex for a feature that probably is not widely used, namely recursive first-class module unpacks:
let rec (module M) = e1 in e2I agree that your suggestion is better than the approach I originally took. But, in discussion with @lpw25 , we wondered if it would be better to instead disallow recursive unpacks. This lowers the maintenance burden of this PR (we get to delete [1]). Leo also points out that it's a bit odd that unpacks are allowed as P in let rec P = ... when basically nothing else is.
I think we should probably not block this PR on the decision of whether to remove recursive unpacks. Depending on what you think, we could explore making another PR to do this.
[1]
if rec_flag = Recursive && may_contain_modules then begin
List.iter
(fun vb ->
(* [type_let] already generalized bound expressions' types
in-place. We first take an instance before checking scope
escape at the outer level to avoid losing generality of
types added to [new_env].
*)
let bound_exp = vb.vb_expr in
generalize_structure_exp bound_exp;
let bound_exp_type = Ctype.instance bound_exp.exp_type in
let loc = proper_exp_loc bound_exp in
let outer_var = newvar2 outer_level in
unify_exp_types loc new_env bound_exp_type outer_var)
pat_exp_list
end;There was a problem hiding this comment.
I don't think unpacks in let rec patterns are currently allowed:
# let rec (module M) = (module struct end);;
Error: Modules are not allowed in this pattern.I get the same result in both 4.14 and 5.0 (haven't tried trunk, but I don't remember any change related to that). I think we'd all rather keep them forbidden, as let rec is already complicated enough as it is.
There was a problem hiding this comment.
They're allowed for let rec ... in ...:
# let rec (module M : S) = (module struct end) in ();;
- : unit = ()but I really think they shouldn't be.
There was a problem hiding this comment.
I agree that maintenance is a burden, and that I don't see much practical use for them.
One can always use the old approach expanded as syntactic sugar anyway.
However, if we do not allow them we need a good rationale for it. Otherwise this is yet another irregularity in the language.
Since I don't see one myself (the only restrictions on recursion are supposed to be about initialization, and this is detected independently), I would tend to keep the support for the sake of regularity. The point is that the feature is not "recursive module unpack" but "module pattern", which is orthogonal to recursion.
BTW, there is a reasonably good rationale for disallowing module patterns in toplevel let definitions, but this still confuses people.
There was a problem hiding this comment.
(the only restrictions on recursion are supposed to be about initialization, and this is detected independently)
No, there is also a restriction that the only allowed patterns for recursive bindings are variables. That's why I was surprised that module patterns were allowed.
And the feature in its current state is very limited: if you have to insert a coercion between the actual structure and its declared signature, then all definitions that are actually recursive are forbidden (I suspect that the coercion makes Rec_check classify the definition as Dynamic instead of Static, but I haven't tried to print the intermediate typedtree to see why). So you don't have access to more features than just unpacking after the definition:
(* This is allowed, including at toplevel: *)
let rec m =
let module M = (val m : S) in
(module struct
let x = 0
let y () =
M.x
end: S)
module M : S' = (val m : S)I'll ping @gasche, since he may have some more insights to share.
There was a problem hiding this comment.
It is indeed the case that we currently only allow variables as the pattern of a let rec. I think that the fact that module patterns were accepted before is an unintended side-effect of their desugaring implementation. I see no problem with rejecting them for this reason, in fact this is arguably the more regular thing to do.
(Note that if we had a Tpat_unpack construct as a proper pattern-former, then they would be rejected naturally by the Illegal_letrec_pat check.)
|
The approach looks correct to me. While the previous approach had advantages from the point of view of soundness, it created difficulties in the match compiler, so I agree with need for something more integrated. Also the code mimics enough the previous approach that we can be confident of its correctness. |
@goldfirere: I realize that my comment may be interpreted as a vote-of-no-confidence on your own offer to review the PR. It certainly wasn't intending to suggest that. There are parts of the codebase that are clearly the turf of someone in particular, and usually I try as part of the review process to get their opinion unless it is a really minor change. For the backend or runtime system, the locals like to give their opinion on the high-level design in particular. For the type-system, I like to ask @garrigue's opinion about the type-system change, but (according to my understanding) Jacques generally prefers other people to review for code quality, clarity, etc., so he is generally not the "main" reviewer. So quite often we have someone to say "I reviewed the code and I think the quality is good -- but the details are always a bit tricky so I'm not 100% confident that the type-checking changes are correct", and then Jacques to say something about the type-checking changes themselves (which doesn't give us 100% confidence, but at least we followed due process). |
|
@garrigue I don't know why module patterns are not allowed for toplevel lets. Is the explanation for this restriction written down somewhere, can you point to it? |
|
I'm not sure the fact module patterns are not allowed for top-level lets is documented. It would mean that a value binding could define new modules, which requires modifying a number of places. This is not impossible to do, but whether this is desirable is open to debate. |
|
It is documented, but without a justification. ocaml/manual/src/refman/extensions/firstclassmodules.etex Lines 50 to 52 in 8746939 Your point, if I understand correctly, is that toplevel |
|
Thanks @garrigue for your review. It's my impression that I've responded to your comments, and that the remaining points of discussion (module patterns in (I know you clicked "Approve", but I've pushed some changes according to your comments since then, so I wanted to check.) |
|
I checked your last commit a0450f9 and it looks fine. Concerning module unpacks in recursive definitions, @gasche 's point sounds valid. It might be worth thinking about allowing module unpacks in toplevel lets too, but this is another story. |
|
This has been previously reviewed by @goldfirere, and now approved by @garrigue as well. I believe that this is good to merge. I am planning to "squash" given that there is no clear candidate for separate commits here -- the change in checking abstract type escape stands on its own, but squashing it with the rest actually reduces and simplifies the diff. |
Remove the elaboration of module patterns, where typechecking would insert let-module expressions for each unpack. For example, this source program:
would be elaborated to this typedtree AST:
That is, the only way that modules could be unpacked in typedtree was via
Texp_letmodule. Now, we perform no such elaboration, and instead treat theTpat_varpattern as an unpack when it haspat_extra = Tpat_unpack.It would be best to review this PR commit-by-commit. The main commit that removes the elaboration is 78d59d6, but this makes available a cleanup in a later commit.
The ideas are the same as a Jane Street version of this patch (ocaml-flambda/ocaml-jst#146 and ocaml-flambda/ocaml-jst#150), if you would like to refer to the discussion there when reviewing this.
Rationale
The old elaboration interrupted the arity of functions, as seen in the example above. Removing this elaboration allows us to simplify the code that tried to recover the arity later. (See 64907dd.) This is in service of a PR I'm working on to make function arity syntactic; see ocaml/RFCs#32. But, even without the larger PR, I think this patch is a simplification.
Though the old elaboration was mainly problematic for functions and not so much for
match/let, I remove the elaboration everywhere for consistency.Approach
To type-check in the new scheme, we:
The mechanism we use for this is similar to
gadt_equations_level, in that we wrap the type-checking of patterns and their extent in its own level, and store the elevated scope in a global ref that pattern type-checking refers to. (In particular, we repurpose theallow_modulesbool to something likeint option, where the int payload communicates the scope to enter the module identifier at.)It's important to think about not only how this PR avoids scope escape, but also how future constructs involving patterns will avoid scope escape. Previously, the way this was accomplished was via the
type_unpacksfunction, which type-checked a body expression within the scope of a list of unpacked modules, and confirmed the modules' types didn't escape via the body's type. In this world, it was easy to see that scope escape was checked anywhere a module was used. Now, the responsibility is more diffuse. But we can still state the matter simply. The responsibility to check scope escape falls to whoever creates the scope such that pattern type-checking setsallow_modules = Modules_allowed { scope }. You can see this responsibility in action in both thetype_casesandtype_letfunctions. This is summarized in a comment by themodule_patterns_restrictiontype.