New implementations of the static check for recursive definitions#1942
New implementations of the static check for recursive definitions#1942gasche merged 16 commits intoocaml:trunkfrom
Conversation
|
(cc @yallop) |
|
Do you think this might bring us closer to producing, as a side-effect of the check, a data-structure which would drive the compilation scheme and make the check "correct by construction"? |
|
I would say that this would only be marginally easier than with the previous implementation; the big conceptual jump was @yallop's idea of using a type-system-like check, this one is easier to reason about than the one it replaces, but in both cases the idea would be to look at the typed derivations as evidence. One thing that is rather easy to do in both cases is to get a sort of dependency order on the mutual declarations. This would allow accepting more things when a topological order is just not one big cycle. But I am not sure that it helps us make the current compilation scheme "more correct". @yallop, @Cemoixerestre and myself are thinking of writing a short paper / extended abstract on the new system, which could give me better intuitions on these things. Thinking about compilation was on the list of things to do "if there is time" for Alban's internship, but this is a six-weeks end-of-bachelor internship, and there was no time. |
typing/rec_check.ml
Outdated
| (* If x is used with the mode m in e[x], and e[x] is used with mode | ||
| m' in e'[e[x]], then x is used with mode m'[m] in e'[e[x]]. | ||
|
|
||
| Return is neutral for composition: m[Neutral] = m = Neutral[m]. |
There was a problem hiding this comment.
Good catch, I'll fix this.
typing/rec_check.ml
Outdated
| (inspect (expression env e2))) | ||
| (* | ||
| G1 |- e1: m[Dereference] | ||
| G2 |- e2: m[Dereference] (*TODO: why is this a dereference?*) |
There was a problem hiding this comment.
There are a couple TODO remaining, is it on purpose?
There was a problem hiding this comment.
Yes, I wanted @yallop to have a chance to comment on them, because he wrote the original implementation. I believe that using Guard here would be correct, but I wanted his opinion before making the check more liberal than the one we replace. (Other opinions are also welcome, of course.)
There was a problem hiding this comment.
It's a dereference because writing a value to a mutable field involves accessing the value. Here's a program that fails with a segmentation fault if Dereference is weakened to Return here:
type t = { mutable f: float }
let g = { f = 0.0 }
let rec x = (g.f <- y; ()) and y = 2.0(You could add the program to the test suite.)
0a955c7 to
ed39a58
Compare
| (* We have carefully placed `u` before `t` here, | ||
| so that the copy { t with .. }, if accepted, | ||
| is evaluated before 't' is initialized -- making | ||
| the assertion below fail, typically abording |
typing/rec_check.ml
Outdated
| match Env.unguarded ty idlist, Env.dependent ty idlist, | ||
| classify_expression expr with | ||
| | _ :: _, _, _ (* The expression inspects rec-bound variables *) | ||
| | _, _ :: _, Dynamic -> (* The expression depends on rec-bound variables |
There was a problem hiding this comment.
The first underscore can be turned into a [] which saves a few brain cycles to the reader.
Armael
left a comment
There was a problem hiding this comment.
This is a first superficial review, I mostly read the comments but not the actual implementation yet.
typing/rec_check.ml
Outdated
|
|
||
| (* Returns the most conservative mode of the two arguments. | ||
|
|
||
| Deref < Return < Guard < Delay < Unused |
typing/rec_check.ml
Outdated
| | Delay, _ -> Delay | ||
| | m', Return -> m' | ||
| | Return, m | ||
| | Guard, m -> m |
There was a problem hiding this comment.
This last case confused me for a moment, because I read it as Guard[m] = m, which is not the case.
Because of the earlier m', Return -> m' innocuous-looking case we have Guard[Return] = Guard, and Guard[m] = m otherwise.
I wonder if it would be worth adding an explicit (redundant) case for Guard[Return], to showcase more explicitly the role of the Guard mode.
typing/rec_check.ml
Outdated
| (** Combine the access information of two expressions *) | ||
| val unguarded : t -> Ident.t list -> Ident.t list | ||
| (** unguarded e l: the list of all identifiers in e that are unguarded of | ||
| dereferenced in the environment e. *) |
There was a problem hiding this comment.
identifiers in e -> identifiers in l
Also, ungarded of dereferenced -> dereferenced or returned ?
typing/rec_check.ml
Outdated
| (** Combine the access information of two expressions *) | ||
| val dependent : t -> Ident.t list -> Ident.t list | ||
| (** unguarded e l: the list of all identifiers in e that are used in e. *) | ||
|
|
There was a problem hiding this comment.
unguarded e l -> dependent e l, identifiers in e -> identifiers in l
typing/rec_check.ml
Outdated
| (** No variables are accessed in an expression; it might be a | ||
| constant or a global identifier *) | ||
| val join : t -> t -> t | ||
| val join_list : t list -> t |
There was a problem hiding this comment.
It would maybe nice to have a comment indicating that environments sharing bindings is supported, and that in that case the right thing happens (the most conservative mode is picked).
typing/rec_check.ml
Outdated
|
|
||
| Deref < Return < Guard < Delay < Unused | ||
| *) | ||
| let prec m m' = |
typing/rec_check.ml
Outdated
| compute the environment of a subterm from its mode, | ||
| so the corresponding function has type [... -> mode -> Env.t]. | ||
|
|
||
| We write ['a -> term_judg] in this case. |
There was a problem hiding this comment.
Shouldn't 'a be ... instead, to match the convention used in the line above?
typing/rec_check.ml
Outdated
| correspond to binding constructs that have both an | ||
| exterior environment (the environment of the whole term) | ||
| and an interior environment (the environment after the binding | ||
| construct has introduced new names in scope). |
There was a problem hiding this comment.
Which one of Gamma and Gamma' is the exterior (resp interior) environment?
typing/rec_check.ml
Outdated
|
|
||
| G |- e : m + m' | ||
| ------------------------------- | ||
| G |- (let x = e : m) -| x:m', G |
There was a problem hiding this comment.
There's an issue with this rule I think (name collision on G; and the final output of the rule is likely not just one of the two Gs in the scope).
Would the correct rule be something like as follows?
G |- e : m + m'
-----------------------------------
x:m', G, G' |- (let x = e : m) -| x:m', G'
There was a problem hiding this comment.
(also, I assume m + m' is prec m m'? whatever it is it should be made explicit)
typing/rec_check.ml
Outdated
| fun f inner_mode -> fun outer_mode -> f (compos outer_mode inner_mode) | ||
|
|
||
| (* A binding judgment [binder] expects a mode and an outer environment, | ||
| and returns an inner environment. [binder >> judg] computes |
There was a problem hiding this comment.
This is in contradiction with the comment above? (which says that bind_judg takes an interior environment and returns an exterior environment)
|
In the description of the PR, you give a list of 4 modes (Dereference, Unguarded, Guard, Delay), but this does not seem to match the modes before the PR (3 modes, Dereferenced, Guarded, Unguarded) or in the PR (5 modes, Dereference, Return, Guard, Delay, Unused). Could you clarify? |
Armael
left a comment
There was a problem hiding this comment.
Added a couple comments while reading the rules
typing/rec_check.ml
Outdated
|
|
||
| G |- e : m + m' | ||
| ------------------------------- | ||
| G |- (let x = e : m) -| x:m', G |
There was a problem hiding this comment.
(also, I assume m + m' is prec m m'? whatever it is it should be made explicit)
055fadc to
06fef59
Compare
typing/rec_check.ml
Outdated
|
|
||
| G |- e : m + m' | ||
| ----------------------------------- | ||
| G+G' |- (let x = e : m) -| x:m', G' |
There was a problem hiding this comment.
The closing parenthesis is off there.
yallop
left a comment
There was a problem hiding this comment.
I think this an improvement -- thanks! The setup feels a bit more complex, but the rules are simpler, which seems like a reasonable tradeoff.
typing/rec_check.ml
Outdated
| the evaluation of the whole program. This is the mode of | ||
| a variable in an expression in which it does not occur. *) | ||
|
|
||
| (* Returns the most conservative mode of the two arguments. |
There was a problem hiding this comment.
Minor: more conservative, not most.
I think "most conservative" could be clarified here: it means something like "the more demanding context".
typing/rec_check.ml
Outdated
| | Unused | ||
| (** [Unused] is for subexpressions that are not used at all during | ||
| the evaluation of the whole program. This is the mode of | ||
| a variable in an expression in which it does not occur. *) |
There was a problem hiding this comment.
Minor: I think Unused should be Discard or Ignore, since all the others are verbs. (Alternatively, change them all to adjectives: Dereferenced, Returned, etc.)
There was a problem hiding this comment.
I went for Ignore, to me Discard includes the idea of an (active) action.
typing/rec_check.ml
Outdated
|
|
||
| Return is neutral for composition: m[Return] = m = Return[m]. | ||
| *) | ||
| let compos m' m = match m', m with |
There was a problem hiding this comment.
Minor: compos? Why not compose?
Aside: 'not' patterns would be useful here.
Besides noting that Return is neutral for composition, you might also note that compos is associative (I checked that it is), and that Unused is an annihilator.
typing/rec_check.ml
Outdated
| returned in the environment e. *) | ||
|
|
||
| val dependent : t -> Ident.t list -> Ident.t list | ||
| (** unguarded e l: the list of all identifiers in e that are used in e. *) |
There was a problem hiding this comment.
Two things to fix in this comment: it should say "dependent", not "unguarded", and it should say "in l", not "in e"
typing/rec_check.ml
Outdated
|
|
||
| (* Expression judgment: | ||
| G |- e : m | ||
| where (m) is an output of the code and (G) is an output; |
There was a problem hiding this comment.
This should say "where (m) is an input of the code"
typing/rec_check.ml
Outdated
| (inspect (expression env e2))) | ||
| (* | ||
| G1 |- e1: m[Dereference] | ||
| G2 |- e2: m[Dereference] (*TODO: why is this a dereference?*) |
There was a problem hiding this comment.
It's a dereference because writing a value to a mutable field involves accessing the value. Here's a program that fails with a segmentation fault if Dereference is weakened to Return here:
type t = { mutable f: float }
let g = { f = 0.0 }
let rec x = (g.f <- y; ()) and y = 2.0(You could add the program to the test suite.)
typing/rec_check.ml
Outdated
| expression e << Dereference | ||
| | Texp_setinstvar (pth,_,_,e) -> | ||
| (* | ||
| G |- e: m[Dereference] (*TODO: why is this a dereference?*) |
There was a problem hiding this comment.
As with Texp_setfield, this really should be a Dereference, so we can drop the TODO.
Here's a program for the test suite that fails with a segmentation fault if Dereference is weakened to Return:
class c = object
val mutable f = 0.0
method m =
let rec x = (f <- y; ()) and y = 2.0 in f
end
let _ = print_float (new c)#m| begin match Typeopt.classify_lazy_argument e with | ||
| (* | ||
| G |- e: m[Delay] | ||
| ---------------- (modulo some subtle compiler optimizations) |
There was a problem hiding this comment.
I'm expecting this to break when somebody implements (e.g.) the lazy value optimization for unboxed constructors. (But perhaps having the common code in Typeopt will be enough to prevent that.)
There was a problem hiding this comment.
Yes, I'm expecting such future changes to return one of the known-to-be-shortcut variants (for unboxed constructors, typically the classifier of the underlying argument), or to add a new variants to be handled properly here.
typing/rec_check.ml
Outdated
| | Tpat_alias (pat, _, _) -> is_destructuring_pattern pat | ||
| | Tpat_var (_id, _) -> false | ||
| | Tpat_alias (pat, _, _) -> | ||
| is_destructuring_pattern pat |
There was a problem hiding this comment.
This looks like unnecessary churn.
06fef59 to
4fce5b5
Compare
|
@yallop: Thanks for the review! I did a pass over the patchset and took your comments into account. There is one thing that we discussed with @Cemoixerestre but didn't have to work on: I believe that we can state (and prove easily by induction) meta-theorems of the following form: when |
4fce5b5 to
b49a930
Compare
|
@yallop: if you believe that the PR is now in a mergeable state, it could be nice to "approve" it -- I think I could ask a maintainer to merge it then. |
|
@gasche: there are some comments that don't have replies: here, here, here, and here. Could you please either leave a comment or mark the conversations as "resolved"? (I'm assuming that you've simply missed them, which is unfortunately very easy to do with GitHub's interface, because one of the comments is about a typo, which should be uncontroversial.) |
|
Thanks! I missed them indeed. I will go over them now. |
b49a930 to
ca67995
Compare
|
I rebased the PR with a more substantial high-level comment at the top of the file, which I would like to be at least as clear as the summary of this PR, without the historical aspect (old and new system) which is not relevant in the versioned codebase. |
7ddbc8f to
cd2489c
Compare
Thanks. I think the new comment is a very useful addition. |
|
(I'll approve once the bug that's causing Travis to fail is fixed.) |
cd2489c to
84bebad
Compare
|
Duh... this should be fixed now. |
84bebad to
2c7a0b8
Compare
|
(just fixed the |
damiendoligez
left a comment
There was a problem hiding this comment.
Approving based on @yallop's review.
I have a few superficial remarks.
| (inspect (option expression env eo))) | ||
| (* | ||
| G |- e: m[Dereference] | ||
| ---------------------- (plus weird 'eo' option) |
There was a problem hiding this comment.
"weird"? That word doesn't make you look confident...
There was a problem hiding this comment.
It doesn't, because we aren't. The code must be safe because Dereference is the most conservative possible choice for eo, but I still have mostly no idea what this eo thing does (despite having looked at the compilation scheme). It's right that people hitting this case would be less-confident and would want (if they are changing it) to check things by themselves.
There was a problem hiding this comment.
but I still have mostly no idea what this
eothing does
I'm not sure how much "mostly" covers here, but the eo is for representing calls to a superclass method through an ancestor variable, like this:
class b = object
method m = 0
end
class d = object
inherit b as parent (* 'parent' is an ancestor *)
method m = parent#m (* Call to method 'm' in class 'b' *)
end2c7a0b8 to
a89714a
Compare
|
@damiendoligez Thanks! I took the comments into account. |
This work was conducted during an internship at INRIA Saclay, under the supervision of Gabriel Scherer in the Parsifal team. A summary is included below. For more details, see https://github.com/Cemoixerestre/Internship-2018-ocaml-recursive-value WIP: ---- This commit corresponds to the state of the work exactly at the end of Alban's internship, rebased by Gabriel Scherer on trunk. The testsuite passes, but there remain some language constructs that are not supported by the new check implementation. We believe that existing OCaml programs (which typically do not make advanced use of recursive value declarations) would compile correctly with this WIP version, but of course we plan to complete the work in the following commits. Summary: -------- Rec_check is based on the idea of assigning "access modes" to variables m ::= Dereference | Guarded | Unguarded | Delayed Before this commit, the implementation (contributed by Jeremy Yallop) was formulated as a type-checker manipulating "uses", which are mappings from free variables to modes u ::= (x : m)* and contexts Γ mapping variables to uses Γ ::= (x : u)* The check relied on a judgment of the form Γ ⊢ e : u where, in the typing algorithm, `Γ` and `e` are the inputs, and `u` is the output — this could be written `-Γ ⊢ e : +u`. After this commit, the implementation uses a simpler notion of context mapping variables to mods Γ ::= (x : m)* and the judgment has the simpler structure Γ ⊢ e : m but now `m` is an input to the judgment, indicating in which usage context `e` is checked, and `Γ` is the output: `+Γ ⊢ e : -m`.
(The bug is already present in 4.06 and 4.07.)
We could factorize the binder judgments in a related way, but it would require a different set of higher-order operator, and I thought that the lower number of occurrences did not justify the extra complexity.
This work was conducted by Alban Reynaud during an internship at INRIA Saclay, with the help and under the supervision of Gabriel Scherer in the Parsifal team. A summary is included below. For more details, see https://github.com/Cemoixerestre/Internship-2018-ocaml-recursive-value or the comments in the code itself, which has evolved a bit from Alban's internship report above. Summary: -------- Rec_check is based on the idea of assigning "access modes" to variables m ::= Dereference (* full access and inspection of the value *) | Unguarded (* the value result is directly returned *) | Guard (* the value is stored under a data constructor *) | Delayed (* the value is not needed at definition time *) | Unused (* the value is not used at all *) Before this patchset, the implementation (contributed by Jeremy Yallop) was formulated as a type-checker manipulating "uses", which are mappings from free variables to modes u ::= (x : m)* and contexts Γ mapping variables to uses Γ ::= (x : u)* The check relied on a judgment of the form Γ ⊢ e : u where, in the typing algorithm, `Γ` and `e` are the inputs, and `u` is the output — this could be written `-Γ ⊢ e : +u`. After this patchset, the implementation uses a simpler notion of context mapping variables to mods Γ ::= (x : m)* and the judgment has the simpler structure Γ ⊢ e : m but now `m` is an input to the judgment, indicating in which usage context `e` is checked, and `Γ` is the output: `+Γ ⊢ e : -m`. Other fixes: ------------ This patchset also fixes a soundness bug around `{ foo with ... }` which is already present in 4.06 and 4.07. Another soundness bug found during this work was submitted and fixed independently as GPR ocaml#1915. This work also led to the refactoring of GPR ocaml#1922, and the refactoring of the testsuite in GPR ocaml#1937.
Suggested-by: Jeremy Yallop
We provide an ocamldoc-style module comment to explain the general lines of the recursive check. This complements existing comments about the term- and binding-judgments that are more implementation-oriented. It was natural to reorder the code so that the "static or dynamic size" code is apart from the "usage of recursive variables" code. Finally, I also added copyright headers to reflect the now shared authorship of the code. (Alban has signed a CLA.)
This GPR is on top of #1937.
This work was conducted by Alban Reynaud @Cemoixerestre during an internship at INRIA
Saclay, with the help and under the supervision of Gabriel Scherer in
the Parsifal team.
A summary is included below. For more details, see
https://github.com/Cemoixerestre/Internship-2018-ocaml-recursive-value
or the comments in the code itself, which has evolved a bit from
Alban's internship report above.
Summary:
Rec_check is based on the idea of assigning "access modes" to variables
Before this patchset, the implementation (contributed by Jeremy Yallop)
was formulated as a type-checker manipulating "uses", which are mappings
from free variables to modes
and contexts Γ mapping variables to uses
The check relied on a judgment of the form
where, in the typing algorithm,
Γandeare the inputs, anduisthe output — this could be written
-Γ ⊢ e : +u.After this patchset, the implementation uses a simpler notion of context
mapping variables to mods
and the judgment has the simpler structure
but now
mis an input to the judgment, indicating in which usagecontext
eis checked, andΓis the output:+Γ ⊢ e : -m.Other fixes:
This patchset also fixes a soundness bug around
{ foo with ... }which is already present in 4.06 and 4.07.
Another soundness bug found during this work was submitted and fixed
independently as GPR #1915.
This work also led to the refactoring of GPR #1922, and the refactoring
of the testsuite in GPR #1937.