Conversation
|
In thinking about this PR, I'm unsure how best to help with review. I'm quite happy to review this -- but I imagine someone with more experience within the type checker should also review... but maybe can do a higher-level pass? In the end, we know review is a bottleneck and wish to lower the burden of dealing with large patches like these. So: how can I best help here? (Full disclosure: I helped choose this as a task for @ncik-roberts to work on, and I offered feedback on aspects of the implementation. I have not done any detailed code review here, though, and I have not contributed any of the code.) |
|
I don't follow the motivation for the backward-compatibility-breaking example: # type (_, _) eq = Eq : ('a, 'a) eq
# let bad : type a. ?opt:(a, int -> int) eq -> unit -> a =
fun ?opt:(Eq = assert false) () x -> x + 1
# let x : 'a. 'a = bad ()Here the syntactic arity of |
|
"Syntactic arity" as I describe in this PR is a property of the expression language, not the type language. So the syntactic arity of The benefit of basing arity on the expression language is that effect ordering and runtime arity is completely predictable. You're right that the current version of OCaml avoids unsoundness by computing an arity of 2 here, but that arity is not syntactically apparent. The number of arrows in the type can't be used to figure out the arity in general (this statement is true for both trunk and for this PR): (* this has arity 2 in both trunk and this PR *)
let arity2 : type a. a -> a lazy_t -> bool -> a =
fun x (lazy y) ->
(fun b -> if b then x else y)That's a clarification of what syntactic arity means but doesn't respond to your point. I'll mention that it's a small change to rewrite # let good : type a. ?opt:(a, int -> int) eq -> unit -> a =
fun ?opt:(Eq = assert false) () -> (fun x -> x + 1)
val good : ?opt:('a, int -> int) eq -> unit -> 'a = <fun>Requiring this rewrite in these cases is certainly a trade-off against the benefits of arity. But "these cases" are limited to ones where the user is hiding an arrow in the type of the function being defined using a GADT pattern bound by that same function's parameters. Your judgement is helpful in gauging the impact of this. My feeling is that this would be very rare. |
|
@Octachron asked me for my opinion on this specific matter (the unification with a function type to guard against unsoundness). Here is my understanding/opinion right now. The unification can be understood as the fact that we are in fact type-checking an eta-expanded version of the program, that first takes several arguments (according on the syntactic arity of the function) and then applies the patterns to them. (This expansion is explained in the RFC document.) (* before *)
let f ?opt:(p1 = def1) p2 p3 = body
(* after expansion *)
let f ?opt:arg1 arg2 arg3 =
let p1 = match arg1 with
| None -> assert false
| Some x -> x
in
let p2 = arg2 in
let p3 = arg3 in
bodyAnother way to prevent unsoundness would be to perform the expansion and type-check the result normally. Is it equivalent to the approach that was implemented, with a unification after the fact? (This is not obvious to me in presence of Pexp_newtype abstractions etc.) Otherwise, would it maybe be possible to formulate the check in a way that would more clearly relate to the obviously-sound expansion? Then there is the question of what we think of the expansion and the restriction it imposes on the typability of OCaml programs -- it rejects more programs. My current thoughts:
The An example of (2) would be: type (_, _) eqtest = Equal : ('a, 'a) eqtest | Nope : ('a, 'b) eqtest
let ex2 : type a . (a, unit -> unit) eqtest -> a = fun Equal () -> ()An example of (3) was also given: type (_, _) eq = Equal : ('a, 'a) eq
let ex3 : type a . (a, unit -> unit) eq -> a = fun Equal () -> ()Personally I certainly agree that rejecting examples in category (1) is fair game, and I would agree that rejecting (2) is also reasonable, but I do find rejecting (3) more frustrating (I suspect this is the category that @Octachron finds problematic). I can see arguments going both ways:
A weakness of the "accepting (3)" position is that the "clear criterion" very much depends on typing, we cannot tell if a pattern-matching is exhaustive or not just from the definition. The RFC as proposed is purely syntactic, which has the force of simplicity. |
I would say that the implementation is extremely close to just doing the expansion, but to be exhaustive, I would add two caveats:
The caveats seem minor to me; I discuss them below. My response here doesn't help decide when we should do this expansion, but it does support a notion that the restriction on typability is a question about the expansion as opposed to a question about the current implementation.
|
|
I think this design point is a good one to debate. While I tend to lean toward simplicity as an important factor, it is annoying that simplicity and backward compatibility pull in opposite directions, and others are better suited than I in judging the weight of this backward incompatibility. It strikes me, though -- and in conversation with @ncik-roberts -- that review on the correctness of this patch can be done even without a decision on this particular design point. Nick esitmates that changing the behavior in @gasche's case (3) would just add complexity, meaning that time spent reviewing the patch as-is would all be well spent. I am happy to be this reviewer (as I offered above), but I wanted to check that my review would be seen as sufficiently independent here before putting the time in. What do you think? |
|
I think that applying the expansion (or an equivalent type-checking transformation) unconditionally, at the cost of breaking typability in case (3) (as currently proposed in this PR) is the better choice here. Indeed, I find the degree of sophistication involved in accepting (3) unreasonable; my understanding is that we would have to type-check each pattern in
I find this too complex, and it has a backtracking flavor that is unpleasant. (Also, checking whether a pattern is exhaustive is complex and we occasionally get it wrong. Having this complex check determine the desugaring / operational behavior of the code does not feel great.) Unfortunately, I also have the impression that detecting this situation -- a function constructor is ill-typed because the current type is not known to be a function type but would be if we had not delayed the introduction of some GADT equations -- is also difficult, so I am pessimistic about the fact that we could write a targeted error message in this situation. |
goldfirere
left a comment
There was a problem hiding this comment.
This is a partial review -- I stopped in type_function. More next week.
1cc4484 to
7258a15
Compare
@gasche The current approach proceeds as follows for an n-ary function:
So, if I am understanding you correctly, it seems to me that we can detect the situation you describe and write a targeted error message in the case that the unification in Step 2 fails. |
2f17f8e to
295214a
Compare
goldfirere
left a comment
There was a problem hiding this comment.
A little bit more review complete. More in an in-person meeting tomorrow.
fc5e3af to
bb3ddf7
Compare
goldfirere
left a comment
There was a problem hiding this comment.
OK. I have reviewed everything here save the files in lambda and in ocamldoc, saving those for someone more expert in those areas of the compiler.
828ee90 to
8288593
Compare
|
@lpw25 volunteered to read the |
lpw25
left a comment
There was a problem hiding this comment.
Approving the parts in lambda/, which look good to me.
goldfirere
left a comment
There was a problem hiding this comment.
This looks like it's in good shape, modulo the little concern about Merlin (which I believe @ncik-roberts is looking into). All the files have been reviewed for correctness. What's left for getting this merged?
|
I agree that it is probably time to merge this PR. Two questions:
|
|
For record's sake, my final opinion on the backward compatibility issue is that the change will be fine with a specialized and understandable error message for this exotic case. Moreover I agree with @ncik-roberts that this is can be done in a straightforward way on top of this PR. For the remaining review, I don't know if @gasche has reviewed the |
|
I've just added the Changes entry. I will now look at these things:
|
parsing/parsetree.mli
Outdated
| [C] represents a type constraint or coercion placed immediately | ||
| before the arrow, e.g. [fun P1 ... Pn : t1 :> t2 -> ...] |
There was a problem hiding this comment.
OCaml doesn't accept this syntax at present. Rather inconsistently, while we currently allow simple constraints on both let and fun forms:
let f x : t = e (* valid *)
let f = fun x : t -> e (* valid *)we allow the let form with a simple or full subtyping constraint
let f x :> s = e (* valid *)
let f x : t :> s = e (* valid *)but don't allow the corresponding constrained fun forms:
let f = fun x :> s -> e (* invalid *)
let f = fun x : t :> s -> e (* invalid *)It would be good to eventually allow these last two forms, too, but for the moment the comment needs to be updated.
There was a problem hiding this comment.
Thanks for catching this. I've updated the example to use a simple constraint with fun.
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
…en-typing-constraints Restore a call to `instance` that was dropped in #12236
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
|
I don't think it was mentioned in the comments, why is the |
|
Yes, this is a fun (type a) -> (module struct
type t = a
end : S with type t = a)There are many such examples; the RHS of the I suppose it could be reasonable to parse these as |
* Newtypes * Constraint/coercion * Add map_half_typed_cases * Implement type-checking/translation This also promotes tests whose output changes. * Add upstream tests Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged) * Fix ocamldoc * Update chamelon minimizer * Respond to requested changes to minimizer * update new test brought in from rebase * Fix bug in chunking code * `make bootstrap` * Add Ast_invariant check * Fix type-directed disambiguation of optional arg defaults * Minor comments from review * Run syntactic-arity test, update output, and fix printing bug * Remove unnecessary call to escape * Backport changes from upstream to comparative alloc tests * Avoid the confusing [Split_function_ty] module * Comment [split_function_ty] better. * [contains_gadt] as variant instead of bool * Calculate is_final_val_param on the fly rather than precomputing indexes * Note suboptimality * Get typecore typechecking * Finish resolving merge conflicts and run tests * make bootstrap * Add iteration on / mapping over locations and attributes * Reduce diff and fix typo in comment: * promote change to zero-alloc arg structure * Undo unintentional formatting changes to chamelon * Fix minimizer * Minimize diff * Fix bug with local-returning method * Fix regression where polymorphic parameters weren't allowed to be used in same parameter list as GADTs * Fix merge conflicts and make bootstrap * Apply expected diff to zero-alloc test changed in this PR
…n (#1817) * Newtypes * Constraint/coercion * Add map_half_typed_cases * Implement type-checking/translation This also promotes tests whose output changes. * Add upstream tests Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged) * Fix ocamldoc * Update chamelon minimizer * Respond to requested changes to minimizer * update new test brought in from rebase * Fix bug in chunking code * `make bootstrap` * Add Ast_invariant check * Fix type-directed disambiguation of optional arg defaults * Minor comments from review * Run syntactic-arity test, update output, and fix printing bug * Remove unnecessary call to escape * Backport changes from upstream to comparative alloc tests * Avoid the confusing [Split_function_ty] module * Comment [split_function_ty] better. * [contains_gadt] as variant instead of bool * Calculate is_final_val_param on the fly rather than precomputing indexes * Note suboptimality * Get typecore typechecking * Finish resolving merge conflicts and run tests * make bootstrap * Add iteration on / mapping over locations and attributes * Reduce diff and fix typo in comment: * promote change to zero-alloc arg structure * Undo unintentional formatting changes to chamelon * Fix minimizer * Minimize diff * Fix bug with local-returning method * Fix regression where polymorphic parameters weren't allowed to be used in same parameter list as GADTs * Fix merge conflicts and make bootstrap * Apply expected diff to zero-alloc test changed in this PR
CHANGES: - Make the `unprocessed` alert fatal by default ([melange-re/melange#1135](melange-re/melange#1135)) - support `-H` for hidden include dirs in OCaml 5.2 ([melange-re/melange#1137](melange-re/melange#1137)) - support `[@mel.*]` attributes in uncurried externals ([melange-re/melange#1140](melange-re/melange#1140)) - add Worker types to `melange.dom` ([melange-re/melange#1147](melange-re/melange#1147)) - Add support for dynamic `import()` ([melange-re/melange#1164](melange-re/melange#1164)) - Fix code generation of custom `true` / `false` constructors ([melange-re/melange#1175](melange-re/melange#1175)) - Fix code generation of OCaml objects that refers to an init variable in scope ([melange-re/melange#1183](melange-re/melange#1183)) - Support `[@mel.as "string"]` in variant definitions ([melange-re/melange#884](melange-re/melange#884)) - BREAKING: remove `[@@deriving jsConverter]` for variant definitions ([melange-re/melange#884](melange-re/melange#884)). Use `[@mel.as "string here"]` instead. - Support OCaml 5.3 ([melange-re/melange#1168](melange-re/melange#1168)) - Upgrade Stdlib to the OCaml 5.3 Stdlib ([melange-re/melange#1182](melange-re/melange#1182)) - Support `[@mel.tag "the_tag"]` in variant definitions ([melange-re/melange#1189](melange-re/melange#1189)) - combined with `[@mel.as ..]` in variant definitions and inline record payloads, Melange can now express types for discriminated union object shapes. - melange-ppx: don't silence warning 20 (`ignored-extra-argument`) for `%mel.raw` application ([melange-re/melange#1166](melange-re/melange#1166)). - This change reverts the behavior introduced in ([melange-re/melange#915](melange-re/melange#915)). - The new recommendation is to annotate `%mel.raw` functions or disable warning 20 at the project level. - ppx,core: propagate internal FFI information via attributes instead of adding marshalled data in the native primitive name ([melange-re/melange#1222](melange-re/melange#1222)) - melange-ppx: allow `@mel.unwrap` polyvariants not to have a payload ([melange-re/melange#1239](melange-re/melange#1239)) - `melange.node`: fix `Buffer.fromString` and add `Buffer.fromStringWithEncoding` ([melange-re/melange#1246](melange-re/melange#1246)) - `melange.node`: bind to all supported Node.js `Buffer` encodings in `Node.Buffer` ([melange-re/melange#1246](melange-re/melange#1246)) - `melange.js`: Add `Js.FormData` with bindings to the [FormData](https://developer.mozilla.org/en-US/docs/Web/API/FormData) API ([melange-re/melange#1153](melange-re/melange#1153), [melange-re/melange#1270](melange-re/melange#1270), [melange-re/melange#1281](melange-re/melange#1281) - `melange.js`: Add `Js.Blob` and `Js.File` with bindings to the [Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob) and [File](https://developer.mozilla.org/en-US/docs/Web/API/File) APIs ([melange-re/melange#1218](melange-re/melange#1218)) - `melange.js`: add `TypedArray` types at the toplevel in the `Js` module ([melange-re/melange#1248](melange-re/melange#1248)) - BREAKING: remove `--mel-g` ([melange-re/melange#1234](melange-re/melange#1234)) - runtime(`melange.js`): port `[@mel.send.pipe]` functions to `[@mel.send]`, taking advantage of the `@mel.send` + labeled argument improvement (see above) ([melange-re/melange#1260](melange-re/melange#1260), [melange-re/melange#1264](melange-re/melange#1264), [melange-re/melange#1265](melange-re/melange#1265), [melange-re/melange#1266](melange-re/melange#1266), [melange-re/melange#1280](melange-re/melange#1280), [melange-re/melange#1278](melange-re/melange#1278)) - core: fix a crash related to finding constructor names in pattern matching triggered by dune's earlier implementation of `(implicit_transitive_deps false)` ([melange-re/melange#1238](melange-re/melange#1238), [melange-re/melange#1262](melange-re/melange#1262)) - core: pre-compute the closure param map for functions inlined with `--mel-cross-module-opt` ([melange-re/melange#1219](melange-re/melange#1219)) - BREAKING: ppx: print the `deprecated` alert for `@@deriving abstract` at the declaration site rather than at (all) usages ([melange-re/melange#1269](melange-re/melange#1269)) - JS generation: prettify `for` loops ([melange-re/melange#1275](melange-re/melange#1275)) - JS generation: improve formatting for `throw` and `return` statements, JS objects ([melange-re/melange#1286](melange-re/melange#1286), [melange-re/melange#1289](melange-re/melange#1289)) - JS generation: improve formatting for empty return and continue statements ([melange-re/melange#1288](melange-re/melange#1288)) - JS generation: remove trailing spaces before commas in `export` ([melange-re/melange#1287](melange-re/melange#1287)) - JS generation: remove redundant switch cases branches ([melange-re/melange#1295](melange-re/melange#1295)) - JS generation: move space before comma inside `for` definition ([melange-re/melange#1296](melange-re/melange#1296)) - JS generation: add space before while loop condition ([melange-re/melange#1297](melange-re/melange#1297)) - JS generation: improve indentation of parenthesized blocks ([melange-re/melange#1293](melange-re/melange#1293)) - JS generation: add space after constructor comments ([melange-re/melange#1294](melange-re/melange#1294)) - JS generation: improve identation of `switch` cases ([melange-re/melange#1299](melange-re/melange#1299)) - JS generation: don't generate empty `default:` cases in `switch` ([melange-re/melange#1300](melange-re/melange#1300)) - JS generation: emit `module.exports` in CommonJS instead of `exports.x` ([melange-re/melange#1314](melange-re/melange#1314)) - JS generation: remove trailing newline after `switch` ([melange-re/melange#1313](melange-re/melange#1313)) - ffi: allow annotating `@mel.send` FFI with `@mel.this` to specify which parameter should represent the "self" argument. ([melange-re/melange#1303](melange-re/melange#1285), [melange-re/melange#1310](melange-re/melange#1310)) - This improvement to the FFI allows expressing more FFI constructs via labeled and optionally labeled arguments, e.g. `external foo: value:string -> (t [@mel.this]) -> unit = "foo" [@@mel.send]` will now produce `t.foo(value)` instead of `value.foo(t)`. - It also allows removing usages of `[@mel.send.pipe: t]` in favor of `[@mel.send]` with `[@mel.this]`, including when used with `@mel.variadic`. - ppx: deprecate `[@mel.send.pipe]` ([melange-re/melange#1321](melange-re/melange#1321)) - core: fix missed optimization on OCaml versions 5.2 and above, caused by [ocaml/ocaml#12236](ocaml/ocaml#12236) generating multiple function nodes for `fun a -> fun b -> ...` in the Lambda IR.
CHANGES: - Make the `unprocessed` alert fatal by default ([melange-re/melange#1135](melange-re/melange#1135)) - support `-H` for hidden include dirs in OCaml 5.2 ([melange-re/melange#1137](melange-re/melange#1137)) - support `[@mel.*]` attributes in uncurried externals ([melange-re/melange#1140](melange-re/melange#1140)) - add Worker types to `melange.dom` ([melange-re/melange#1147](melange-re/melange#1147)) - Add support for dynamic `import()` ([melange-re/melange#1164](melange-re/melange#1164)) - Fix code generation of custom `true` / `false` constructors ([melange-re/melange#1175](melange-re/melange#1175)) - Fix code generation of OCaml objects that refers to an init variable in scope ([melange-re/melange#1183](melange-re/melange#1183)) - Support `[@mel.as "string"]` in variant definitions ([melange-re/melange#884](melange-re/melange#884)) - BREAKING: remove `[@@deriving jsConverter]` for variant definitions ([melange-re/melange#884](melange-re/melange#884)). Use `[@mel.as "string here"]` instead. - Support OCaml 5.3 ([melange-re/melange#1168](melange-re/melange#1168)) - Upgrade Stdlib to the OCaml 5.3 Stdlib ([melange-re/melange#1182](melange-re/melange#1182)) - Support `[@mel.tag "the_tag"]` in variant definitions ([melange-re/melange#1189](melange-re/melange#1189)) - combined with `[@mel.as ..]` in variant definitions and inline record payloads, Melange can now express types for discriminated union object shapes. - melange-ppx: don't silence warning 20 (`ignored-extra-argument`) for `%mel.raw` application ([melange-re/melange#1166](melange-re/melange#1166)). - This change reverts the behavior introduced in ([melange-re/melange#915](melange-re/melange#915)). - The new recommendation is to annotate `%mel.raw` functions or disable warning 20 at the project level. - ppx,core: propagate internal FFI information via attributes instead of adding marshalled data in the native primitive name ([melange-re/melange#1222](melange-re/melange#1222)) - melange-ppx: allow `@mel.unwrap` polyvariants not to have a payload ([melange-re/melange#1239](melange-re/melange#1239)) - `melange.node`: fix `Buffer.fromString` and add `Buffer.fromStringWithEncoding` ([melange-re/melange#1246](melange-re/melange#1246)) - `melange.node`: bind to all supported Node.js `Buffer` encodings in `Node.Buffer` ([melange-re/melange#1246](melange-re/melange#1246)) - `melange.js`: Add `Js.FormData` with bindings to the [FormData](https://developer.mozilla.org/en-US/docs/Web/API/FormData) API ([melange-re/melange#1153](melange-re/melange#1153), [melange-re/melange#1270](melange-re/melange#1270), [melange-re/melange#1281](melange-re/melange#1281) - `melange.js`: Add `Js.Blob` and `Js.File` with bindings to the [Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob) and [File](https://developer.mozilla.org/en-US/docs/Web/API/File) APIs ([melange-re/melange#1218](melange-re/melange#1218)) - `melange.js`: add `TypedArray` types at the toplevel in the `Js` module ([melange-re/melange#1248](melange-re/melange#1248)) - BREAKING: remove `--mel-g` ([melange-re/melange#1234](melange-re/melange#1234)) - runtime(`melange.js`): port `[@mel.send.pipe]` functions to `[@mel.send]`, taking advantage of the `@mel.send` + labeled argument improvement (see above) ([melange-re/melange#1260](melange-re/melange#1260), [melange-re/melange#1264](melange-re/melange#1264), [melange-re/melange#1265](melange-re/melange#1265), [melange-re/melange#1266](melange-re/melange#1266), [melange-re/melange#1280](melange-re/melange#1280), [melange-re/melange#1278](melange-re/melange#1278)) - core: fix a crash related to finding constructor names in pattern matching triggered by dune's earlier implementation of `(implicit_transitive_deps false)` ([melange-re/melange#1238](melange-re/melange#1238), [melange-re/melange#1262](melange-re/melange#1262)) - core: pre-compute the closure param map for functions inlined with `--mel-cross-module-opt` ([melange-re/melange#1219](melange-re/melange#1219)) - BREAKING: ppx: print the `deprecated` alert for `@@deriving abstract` at the declaration site rather than at (all) usages ([melange-re/melange#1269](melange-re/melange#1269)) - JS generation: prettify `for` loops ([melange-re/melange#1275](melange-re/melange#1275)) - JS generation: improve formatting for `throw` and `return` statements, JS objects ([melange-re/melange#1286](melange-re/melange#1286), [melange-re/melange#1289](melange-re/melange#1289)) - JS generation: improve formatting for empty return and continue statements ([melange-re/melange#1288](melange-re/melange#1288)) - JS generation: remove trailing spaces before commas in `export` ([melange-re/melange#1287](melange-re/melange#1287)) - JS generation: remove redundant switch cases branches ([melange-re/melange#1295](melange-re/melange#1295)) - JS generation: move space before comma inside `for` definition ([melange-re/melange#1296](melange-re/melange#1296)) - JS generation: add space before while loop condition ([melange-re/melange#1297](melange-re/melange#1297)) - JS generation: improve indentation of parenthesized blocks ([melange-re/melange#1293](melange-re/melange#1293)) - JS generation: add space after constructor comments ([melange-re/melange#1294](melange-re/melange#1294)) - JS generation: improve identation of `switch` cases ([melange-re/melange#1299](melange-re/melange#1299)) - JS generation: don't generate empty `default:` cases in `switch` ([melange-re/melange#1300](melange-re/melange#1300)) - JS generation: emit `module.exports` in CommonJS instead of `exports.x` ([melange-re/melange#1314](melange-re/melange#1314)) - JS generation: remove trailing newline after `switch` ([melange-re/melange#1313](melange-re/melange#1313)) - ffi: allow annotating `@mel.send` FFI with `@mel.this` to specify which parameter should represent the "self" argument. ([melange-re/melange#1303](melange-re/melange#1285), [melange-re/melange#1310](melange-re/melange#1310)) - This improvement to the FFI allows expressing more FFI constructs via labeled and optionally labeled arguments, e.g. `external foo: value:string -> (t [@mel.this]) -> unit = "foo" [@@mel.send]` will now produce `t.foo(value)` instead of `value.foo(t)`. - It also allows removing usages of `[@mel.send.pipe: t]` in favor of `[@mel.send]` with `[@mel.this]`, including when used with `@mel.variadic`. - ppx: deprecate `[@mel.send.pipe]` ([melange-re/melange#1321](melange-re/melange#1321)) - core: fix missed optimization on OCaml versions 5.2 and above, caused by [ocaml/ocaml#12236](ocaml/ocaml#12236) generating multiple function nodes for `fun a -> fun b -> ...` in the Lambda IR.
i.e. when clients of the letrec library generate bindings for functions of the form fun x -> fun y -> e the letrec library now converts them into functions of the form: fun x y -> e The following PR describes the change to OCaml that motivated this change to letrec: Syntactic function arity ocaml/ocaml#12236
This PR implements syntactic function arity as described in this RFC.
This is a big PR! To help everyone understand what's going on, I've written this (longish) introductory post. It starts by summarizing the related RFC, offers some changes to design that came up during implementation, includes details to help a reviewer understand the changes, and ends with a few warts in the current implementation that seem unnecessary to solve now. Readers interested in language design but not implementation details can read just through the design details. The rest of this post is relevant only to someone providing a detailed review and may make the most sense read in parallel with the review.
The motivation is twofold:
The RFC has more detail, but I’ll summarize briefly. A function has a notion of “arity”, which is how many arguments it will accept before running effects for optional argument defaults and pattern matching. (Pattern matching can have effects, e.g. lazy patterns.) Currently the parser treats all functions as unary and the arity is only determined later based on the typedtree. Effectful patterns interrupt arity, so
let f (lazy x) y = x + yis treated as a unary function returning a unary function. However, this changes with this PR.After this PR, arity is a syntactic notion, so
let f (lazy x) y = x + yis a 2-ary function.f (lazy (assert false))won’t raise, butf (lazy (assert false)) 3will.There is a related, but subtly different, notion of “runtime arity” of a function. If a function application is fully saturated (i.e. the function has runtime arity n and is applied to n arguments), there is an optimization to make application fast in native code. This PR makes syntactic arity almost always the same notion as runtime arity. The one exception is when the syntactic arity is greater than the max allowed runtime arity (127), in which case the application optimization stops being useful.
Design Decisions and Clarifications to RFC
There are a few details implicit in or missing from the RFC that are worth spelling out here.
Optional argument defaults
The RFC describes how pattern matching works with syntactic arity, but doesn’t give a precise spec for optional argument defaults. The intuition is the same. Pattern matches and optional argument defaults are evaluated from left-to-right, with the relative order preserved from the parameter order.
The more general characterization is as follows:
Any active function parameter in the list of parameters between
funand->(or betweenlet function_nameand=) is evaluated in left-to-right order before entering the body of a function (or before performing the match that is part of a function body). An active function parameter is any of the following:[| 1.0, x |]. Matching against such a pattern can involve allocation, because x must refer to a boxed float but the array is flat. Allocation can run the GC. Running the GC can run finalizers, which runs arbitrary code.Coercions and constraints
The parsetree spec given in the RFC mentions constraints, but not coercions. This should be a 2-ary function:
stedolan tells me this was an unintentional omission. Coercions should be treated just like normal type constraints, not interrupting the arity of the declared function.
Class functions
Class functions still aren’t n-ary:
I wanted to get input on this design before doing the same thing for class functions.
Avoiding unsoundness
This definition would be unsound:
In the current world, unsoundness is avoided by not pushing the evaluation of
assert falsepast any GADT patterns. (Sobad ()would raise.) But this “pushing the evaluation” is the whole point of syntactic arity. So the approach I take is to stipulate that an n-ary function definition must unify with an n-ary arrow type:This approach has two warts. One, it rules out some previously-accepted (if weird) definitions:
Two, it interacts with locally abstract types in a non-obvious way. The unification happens after unification variables have been substituted for any locally abstract type. So a definition might typecheck, but with
'a -> 'bsubstituted for what was written as a locally-abstract type:(This definition is ok because it will assert false if
optis omitted and it is applied up to the'aargument.)This type-checking behavior is consistent with how locally-abstract types behave in other situations, like recursion and the value restriction, and so is perhaps acceptable.
If this is a sticking point, I’m happy to discuss alternatives.
Question
This is a substantial change to a widely-used AST construct. How does the OCaml community deal with such changes that warrant (e.g.) updating all PPXes?
A guide to review
It’s likely best to review directory-by-directory. Here’s a quick summary of the per-directory changes:
parsing: Add the new function construct, replacing the two old constructs. The new construct includes multiple function parameters,newtypeparameters, coercions/constraints on the function body, and the function body itself (whether that’s function cases or a standard expression body).typing: Though the diff is large, this is semantically a refactor of existing functionality. The new function construct involves several language features (newtypes, coercions, constraints, function cases), and so this patch factors out pieces of type-checking code for those language features to be reused in the typing of the new construct.lambda: Some tricky code is deleted. I get rid ofpush_defaultsand the currying transformation, instead preserving the syntactic arity from type-checking. I add the translation of optional default arguments – this used to be done in type-checking, but this interacts poorly with syntactic arity.utils: I add a helper function,List.chunks_of, that I found useful in enforcing the max runtime arity of a function.tools: Changes to ocamlprof implied by the change in parsetree.ocamldoc: Changes to ocamldoc implied by the change in typedtree.There is more detail below.
The parsetree/typedtree changes imply some changes outside of the parsing/typing directories. The first few commits split up the parsing and typechecking changes, if you’re interested in narrowing in on the changes implied by one or the other.
Parsing
This is mostly a straightforward implementation of the RFC, with a few things to note:
fun (type a b c) -> eis still interpreted as nestedPexp_newtype’s, and not as a zero-argumentPexp_function.arg_label * expression option * patternto keep the diff smaller. This is instead of adopting theArg_simple of pattern | Arg_labelled of string * pattern | Arg_optional of string * pattern * expression optiondistinction described in the RFC. This could easily be changed later.Pconstraint/Pcoercevariant instead of representing coercions/constraints as a pair of optional types. I found this type to be advantageous in type-checking, as it allows us to keep the code that handles coercions/constraints separate from the code that handles the absence of a coercion/constraint. It’s parallel toPexp_constraint/Pexp_coerce, which is a nice bonus.fun x y -> function p1 -> e1 | p2 -> e2, the function should be parsed as aTfunction_casesand not as aTfunction_body $exprin order to get the arity right. The PR attempts to avoid shift-reduce conflicts here by creating a new class of expression,fun_expr, which can appear on the RHS of a lambda/function definition, and which excludes function constructs. To force the interpretation of the RHS asTfunction_body $expr, the programmer can wrap the function in parentheses.Typing
Typing code for three constructs is significantly refactored, in increasing order of invasiveness:
(type a) …fun/functionconstructs)A source of complexity common to all of these is the fact that typing the “argument” is no longer uniform. I address this by parameterizing the typing of these constructs over a function that takes care of typing the argument.
Let me make that clearer with an example. Take constraints. Previously the only constraint-like construct was
Pexp_constraint (e, t), which represents things like(e : t). Constructs likelet f x : t = function <cases>were desugared aslet f x = (function <cases> : t). This desugaring is no longer possible, as the function body is parsed asPfunction_cases, i.e. not an expression. So, I took the code that handledPexp_constraint, bubbled it up to a top-level function, and parameterized it over the argument to the constraint; that allows the argument to be an expression (as in(e : t)) or “function cases” (as inlet f x : t = function <cases>).It is relatively straightforward to see how this technique was applied with
Pexp_newtypeandPexp_constraint/Pexp_coerce. It is somewhat more difficult to see how this worked withtype_cases, so I have a dedicated section on that below.map_half_typed_casesandtype_casesThis section describes the refactoring I did in the typing of function cases. It’s ultimately a no-op refactor, but the approach is more complex than the other pieces of this PR.
There used to be a single function,
type_cases, which handled the typing for fun parameters, match cases, function cases, letop cases, and try cases. In this PR, I pull out a large amount of that code intomap_half_typed_cases– the typing of fun parameters now calls that instead oftype_cases. The “half” in “half typed cases” refers to the fact that the pattern, but not the body, is type-checked. (This terminology is used already intypecore.)Take an example like
fun x y -> function <cases>. You can think ofmap_half_typed_casesas the code in common to the typing ofx,y, and<cases>. It’s the common prelude and postlude that we always want to run when typing a parameter, including (say) the typing of the pattern. But it’s now parameterized over what it means to type the body of the parameter. Forxandy, that’s a recursive call totype_functionto check the rest of the parameters and the body; for<cases>, that’s checking the expression bodies of each of the cases.GitHub’s diff is actually very helpful here. It shows how
type_caseswas refactored intomap_half_typed_cases. The big deleted hunk of code was just moved into the newtype_cases.type_casesis now only called on things that are syntactically cases, so it does some extra things:Typing a function
This is now structured as a fold over the parameters.
type_functiondoes much of the same work as the oldtype_function, except now it also calls the parameterized versions of typing constraints, newtypes, and cases.A guide to review: better diffs
Refactoring newtypes/coercions involved moving a fair amount of code. This bash function helps you see that, with color:
type_newtypemy_diff '/Pexp_newtype/,/Pexp_pack/p' '/and type_newtype/,/and type_ident/p'type_coercemy_diff '/Pexp_coerce(sarg/,/Pexp_send/p' '/and type_coerce/,/and type_constraint/p'Warts in the implementation
Methods
There is a small ugly function in translation to give methods the right arity.
Take the method
mof this class as an example:This method is parsed roughly as:
Pexp_function ([ “self” ], Pexp_poly (Pexp_function ([ “x”, “y” ], … (* x + y *)))). But we’d like formto have an arity of 3 (not because this can change the semantics of the program, but so that we get a runtime arity of 3 and can benefit from the native code optimization). The approach I take is to recover the arity in translation to lambda, by fusing together function bodies when the inner one is aTexp_poly. This is possible because methods are the only places thatPexp_polyis introduced.It seems more principled to have an AST node that encodes this more directly. (Perhaps
Pexp_polyshould take an extra argument,self?) But, I didn’t do this as part of this PR to avoid yet another invasive change.Attributes
Some attributes continue to be dropped. E.g.:
Here, the inline attribute is dropped in translation as it’s attached to the
Pfunction_casespiece of an n-ary function, and not a proper expression. (This function can’t be inlined!) This doesn’t change anything. Previously this attribute was silently dropped in the currying translation.Attributes relevant to type-checking continue to be respected, e.g. this is OK: