Skip to content

New implementations of the static check for recursive definitions#1942

Merged
gasche merged 16 commits intoocaml:trunkfrom
gasche:alban-rec-check
Sep 14, 2018
Merged

New implementations of the static check for recursive definitions#1942
gasche merged 16 commits intoocaml:trunkfrom
gasche:alban-rec-check

Conversation

@gasche
Copy link
Copy Markdown
Member

@gasche gasche commented Jul 30, 2018

This GPR is on top of #1937.

This work was conducted by Alban Reynaud @Cemoixerestre during an internship at INRIA
Saclay, with the help and under the supervision of Gabriel Scherer in
the Parsifal team.

A summary is included below. For more details, see

https://github.com/Cemoixerestre/Internship-2018-ocaml-recursive-value

or the comments in the code itself, which has evolved a bit from
Alban's internship report above.

Summary:

Rec_check is based on the idea of assigning "access modes" to variables

m ::= Dereference (* full access and inspection of the value *)
    | Unguarded (* the value result is directly returned *)
    | Guard (* the value is stored under a data constructor *)
    | Delayed (* the value is not needed at definition time *)
    | Unused (* the value is not used at all *)

Before this patchset, the implementation (contributed by Jeremy Yallop)
was formulated as a type-checker manipulating "uses", which are mappings
from free variables to modes

u ::= (x : m)*

and contexts Γ mapping variables to uses

Γ ::= (x : u)*

The check relied on a judgment of the form

Γ ⊢ e : u

where, in the typing algorithm, Γ and e are the inputs, and u is
the output — this could be written -Γ ⊢ e : +u.

After this patchset, the implementation uses a simpler notion of context
mapping variables to mods

Γ ::= (x : m)*

and the judgment has the simpler structure

 Γ ⊢ e : m

but now m is an input to the judgment, indicating in which usage
context e is checked, and Γ is the output: +Γ ⊢ e : -m.

Other fixes:

This patchset also fixes a soundness bug around { foo with ... }
which is already present in 4.06 and 4.07.

Another soundness bug found during this work was submitted and fixed
independently as GPR #1915.

This work also led to the refactoring of GPR #1922, and the refactoring
of the testsuite in GPR #1937.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jul 30, 2018

(cc @yallop)

@alainfrisch
Copy link
Copy Markdown
Contributor

Do you think this might bring us closer to producing, as a side-effect of the check, a data-structure which would drive the compilation scheme and make the check "correct by construction"?

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jul 30, 2018

I would say that this would only be marginally easier than with the previous implementation; the big conceptual jump was @yallop's idea of using a type-system-like check, this one is easier to reason about than the one it replaces, but in both cases the idea would be to look at the typed derivations as evidence.

One thing that is rather easy to do in both cases is to get a sort of dependency order on the mutual declarations. This would allow accepting more things when a topological order is just not one big cycle. But I am not sure that it helps us make the current compilation scheme "more correct".

@yallop, @Cemoixerestre and myself are thinking of writing a short paper / extended abstract on the new system, which could give me better intuitions on these things.

Thinking about compilation was on the list of things to do "if there is time" for Alban's internship, but this is a six-weeks end-of-bachelor internship, and there was no time.

@gasche gasche force-pushed the alban-rec-check branch from aa6437c to fadff5f Compare July 30, 2018 13:20
(* If x is used with the mode m in e[x], and e[x] is used with mode
m' in e'[e[x]], then x is used with mode m'[m] in e'[e[x]].

Return is neutral for composition: m[Neutral] = m = Neutral[m].
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neutral ~> Return here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll fix this.

(inspect (expression env e2)))
(*
G1 |- e1: m[Dereference]
G2 |- e2: m[Dereference] (*TODO: why is this a dereference?*)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple TODO remaining, is it on purpose?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted @yallop to have a chance to comment on them, because he wrote the original implementation. I believe that using Guard here would be correct, but I wanted his opinion before making the check more liberal than the one we replace. (Other opinions are also welcome, of course.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a dereference because writing a value to a mutable field involves accessing the value. Here's a program that fails with a segmentation fault if Dereference is weakened to Return here:

    type t = { mutable f: float }
    let g = { f = 0.0 }
    let rec x = (g.f <- y; ()) and y = 2.0

(You could add the program to the test suite.)

@gasche gasche force-pushed the alban-rec-check branch 2 times, most recently from 0a955c7 to ed39a58 Compare July 30, 2018 15:51
(* We have carefully placed `u` before `t` here,
so that the copy { t with .. }, if accepted,
is evaluated before 't' is initialized -- making
the assertion below fail, typically abording
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abording ~> aborting ?

match Env.unguarded ty idlist, Env.dependent ty idlist,
classify_expression expr with
| _ :: _, _, _ (* The expression inspects rec-bound variables *)
| _, _ :: _, Dynamic -> (* The expression depends on rec-bound variables
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first underscore can be turned into a [] which saves a few brain cycles to the reader.

Copy link
Copy Markdown
Member

@Armael Armael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first superficial review, I mostly read the comments but not the actual implementation yet.


(* Returns the most conservative mode of the two arguments.

Deref < Return < Guard < Delay < Unused
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deref -> Dereference

| Delay, _ -> Delay
| m', Return -> m'
| Return, m
| Guard, m -> m
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last case confused me for a moment, because I read it as Guard[m] = m, which is not the case.
Because of the earlier m', Return -> m' innocuous-looking case we have Guard[Return] = Guard, and Guard[m] = m otherwise.
I wonder if it would be worth adding an explicit (redundant) case for Guard[Return], to showcase more explicitly the role of the Guard mode.

(** Combine the access information of two expressions *)
val unguarded : t -> Ident.t list -> Ident.t list
(** unguarded e l: the list of all identifiers in e that are unguarded of
dereferenced in the environment e. *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identifiers in e -> identifiers in l
Also, ungarded of dereferenced -> dereferenced or returned ?

(** Combine the access information of two expressions *)
val dependent : t -> Ident.t list -> Ident.t list
(** unguarded e l: the list of all identifiers in e that are used in e. *)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unguarded e l -> dependent e l, identifiers in e -> identifiers in l

(** No variables are accessed in an expression; it might be a
constant or a global identifier *)
val join : t -> t -> t
val join_list : t list -> t
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would maybe nice to have a comment indicating that environments sharing bindings is supported, and that in that case the right thing happens (the most conservative mode is picked).


Deref < Return < Guard < Delay < Unused
*)
let prec m m' =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gasche says a better name for this could be join

compute the environment of a subterm from its mode,
so the corresponding function has type [... -> mode -> Env.t].

We write ['a -> term_judg] in this case.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't 'a be ... instead, to match the convention used in the line above?

correspond to binding constructs that have both an
exterior environment (the environment of the whole term)
and an interior environment (the environment after the binding
construct has introduced new names in scope).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one of Gamma and Gamma' is the exterior (resp interior) environment?


G |- e : m + m'
-------------------------------
G |- (let x = e : m) -| x:m', G
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue with this rule I think (name collision on G; and the final output of the rule is likely not just one of the two Gs in the scope).
Would the correct rule be something like as follows?

       G |- e : m + m'
       -----------------------------------
       x:m', G, G' |- (let x = e : m) -| x:m', G'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also, I assume m + m' is prec m m'? whatever it is it should be made explicit)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)

fun f inner_mode -> fun outer_mode -> f (compos outer_mode inner_mode)

(* A binding judgment [binder] expects a mode and an outer environment,
and returns an inner environment. [binder >> judg] computes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in contradiction with the comment above? (which says that bind_judg takes an interior environment and returns an exterior environment)

@Armael
Copy link
Copy Markdown
Member

Armael commented Aug 9, 2018

In the description of the PR, you give a list of 4 modes (Dereference, Unguarded, Guard, Delay), but this does not seem to match the modes before the PR (3 modes, Dereferenced, Guarded, Unguarded) or in the PR (5 modes, Dereference, Return, Guard, Delay, Unused). Could you clarify?

Copy link
Copy Markdown
Member

@Armael Armael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple comments while reading the rules


G |- e : m + m'
-------------------------------
G |- (let x = e : m) -| x:m', G
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also, I assume m + m' is prec m m'? whatever it is it should be made explicit)

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Aug 18, 2018

I just went over @Armael's comments, improved the PR as desired and rebased. Thanks @Armael, and feel free to keep reading :-)


G |- e : m + m'
-----------------------------------
G+G' |- (let x = e : m) -| x:m', G'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closing parenthesis is off there.

Copy link
Copy Markdown
Member

@yallop yallop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this an improvement -- thanks! The setup feels a bit more complex, but the rules are simpler, which seems like a reasonable tradeoff.

the evaluation of the whole program. This is the mode of
a variable in an expression in which it does not occur. *)

(* Returns the most conservative mode of the two arguments.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: more conservative, not most.

I think "most conservative" could be clarified here: it means something like "the more demanding context".

| Unused
(** [Unused] is for subexpressions that are not used at all during
the evaluation of the whole program. This is the mode of
a variable in an expression in which it does not occur. *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think Unused should be Discard or Ignore, since all the others are verbs. (Alternatively, change them all to adjectives: Dereferenced, Returned, etc.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for Ignore, to me Discard includes the idea of an (active) action.


Return is neutral for composition: m[Return] = m = Return[m].
*)
let compos m' m = match m', m with
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: compos? Why not compose?

Aside: 'not' patterns would be useful here.

Besides noting that Return is neutral for composition, you might also note that compos is associative (I checked that it is), and that Unused is an annihilator.

returned in the environment e. *)

val dependent : t -> Ident.t list -> Ident.t list
(** unguarded e l: the list of all identifiers in e that are used in e. *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things to fix in this comment: it should say "dependent", not "unguarded", and it should say "in l", not "in e"


(* Expression judgment:
G |- e : m
where (m) is an output of the code and (G) is an output;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say "where (m) is an input of the code"

(inspect (expression env e2)))
(*
G1 |- e1: m[Dereference]
G2 |- e2: m[Dereference] (*TODO: why is this a dereference?*)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a dereference because writing a value to a mutable field involves accessing the value. Here's a program that fails with a segmentation fault if Dereference is weakened to Return here:

    type t = { mutable f: float }
    let g = { f = 0.0 }
    let rec x = (g.f <- y; ()) and y = 2.0

(You could add the program to the test suite.)

expression e << Dereference
| Texp_setinstvar (pth,_,_,e) ->
(*
G |- e: m[Dereference] (*TODO: why is this a dereference?*)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with Texp_setfield, this really should be a Dereference, so we can drop the TODO.

Here's a program for the test suite that fails with a segmentation fault if Dereference is weakened to Return:

    class c = object
      val mutable f = 0.0
      method m =
        let rec x = (f <- y; ()) and y = 2.0 in f
    end
    let _ = print_float (new c)#m

begin match Typeopt.classify_lazy_argument e with
(*
G |- e: m[Delay]
---------------- (modulo some subtle compiler optimizations)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting this to break when somebody implements (e.g.) the lazy value optimization for unboxed constructors. (But perhaps having the common code in Typeopt will be enough to prevent that.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm expecting such future changes to return one of the known-to-be-shortcut variants (for unboxed constructors, typically the classifier of the underlying argument), or to add a new variants to be handled properly here.

| Tpat_alias (pat, _, _) -> is_destructuring_pattern pat
| Tpat_var (_id, _) -> false
| Tpat_alias (pat, _, _) ->
is_destructuring_pattern pat
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like unnecessary churn.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Sep 3, 2018

@yallop: Thanks for the review! I did a pass over the patchset and took your comments into account.

There is one thing that we discussed with @Cemoixerestre but didn't have to work on: I believe that we can state (and prove easily by induction) meta-theorems of the following form: when Gamma |- e : Delay, then Gamma maps all the variables that do occur in e to Delay, and the others to Ignore. Similarly, when Gamma |- e : Dereference, then Gamma maps the occurring variables to Dereference and the other to Ignore. I plan to convince myself that these two meta-theorems do hold, and then add them in comments. The Delay theorem makes it easy to convince oneself that recursive functions work just fine, and the Dereference one gives a sort of upper bound on accepted terms.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Sep 10, 2018

@yallop: if you believe that the PR is now in a mergeable state, it could be nice to "approve" it -- I think I could ask a maintainer to merge it then.

@yallop
Copy link
Copy Markdown
Member

yallop commented Sep 10, 2018

@gasche: there are some comments that don't have replies: here, here, here, and here. Could you please either leave a comment or mark the conversations as "resolved"? (I'm assuming that you've simply missed them, which is unfortunately very easy to do with GitHub's interface, because one of the comments is about a typo, which should be uncontroversial.)

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Sep 10, 2018

Thanks! I missed them indeed. I will go over them now.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Sep 10, 2018

I rebased the PR with a more substantial high-level comment at the top of the file, which I would like to be at least as clear as the summary of this PR, without the historical aspect (old and new system) which is not relevant in the versioned codebase.

@yallop
Copy link
Copy Markdown
Member

yallop commented Sep 10, 2018

I rebased the PR with a more substantial high-level comment at the top of the file,

Thanks. I think the new comment is a very useful addition.

@yallop
Copy link
Copy Markdown
Member

yallop commented Sep 10, 2018

(I'll approve once the bug that's causing Travis to fail is fixed.)

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Sep 10, 2018

Duh... this should be fixed now.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Sep 10, 2018

(just fixed the check-typo check)

Copy link
Copy Markdown
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on @yallop's review.

I have a few superficial remarks.

(inspect (option expression env eo)))
(*
G |- e: m[Dereference]
---------------------- (plus weird 'eo' option)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"weird"? That word doesn't make you look confident...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, because we aren't. The code must be safe because Dereference is the most conservative possible choice for eo, but I still have mostly no idea what this eo thing does (despite having looked at the compilation scheme). It's right that people hitting this case would be less-confident and would want (if they are changing it) to check things by themselves.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I still have mostly no idea what this eo thing does

I'm not sure how much "mostly" covers here, but the eo is for representing calls to a superclass method through an ancestor variable, like this:

class b = object
  method m = 0
end
class d = object
  inherit b as parent   (* 'parent' is an ancestor *)
  method m = parent#m  (* Call to method 'm' in class 'b' *)
end

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Sep 13, 2018

@damiendoligez Thanks! I took the comments into account.

alban and others added 16 commits September 13, 2018 16:46
This work was conducted during an internship at INRIA Saclay, under
the supervision of Gabriel Scherer in the Parsifal team.

A summary is included below. For more details, see

  https://github.com/Cemoixerestre/Internship-2018-ocaml-recursive-value

WIP:
----

This commit corresponds to the state of the work exactly at the end of
Alban's internship, rebased by Gabriel Scherer on trunk.
The testsuite passes, but there remain some language constructs that
are not supported by the new check implementation.

We believe that existing OCaml programs (which typically do not make
advanced use of recursive value declarations) would compile correctly
with this WIP version, but of course we plan to complete the work in
the following commits.

Summary:
--------

Rec_check is based on the idea of assigning "access modes" to variables

    m ::= Dereference | Guarded | Unguarded | Delayed

Before this commit, the implementation (contributed by Jeremy Yallop)
was formulated as a type-checker manipulating "uses", which are mappings
from free variables to modes

    u ::= (x : m)*

and contexts Γ mapping variables to uses

    Γ ::= (x : u)*

The check relied on a judgment of the form

    Γ ⊢ e : u

where, in the typing algorithm, `Γ` and `e` are the inputs, and `u` is
the output — this could be written `-Γ ⊢ e : +u`.

After this commit, the implementation uses a simpler notion of context
mapping variables to mods

    Γ ::= (x : m)*

and the judgment has the simpler structure

     Γ ⊢ e : m

but now `m` is an input to the judgment, indicating in which usage
context `e` is checked, and `Γ` is the output: `+Γ ⊢ e : -m`.
(The bug is already present in 4.06 and 4.07.)
We could factorize the binder judgments in a related way, but it would
require a different set of higher-order operator, and I thought that
the lower number of occurrences did not justify the extra complexity.
This work was conducted by Alban Reynaud during an internship at INRIA
Saclay, with the help and under the supervision of Gabriel Scherer in
the Parsifal team.

A summary is included below. For more details, see

  https://github.com/Cemoixerestre/Internship-2018-ocaml-recursive-value

or the comments in the code itself, which has evolved a bit from
Alban's internship report above.

Summary:
--------

Rec_check is based on the idea of assigning "access modes" to variables

    m ::= Dereference (* full access and inspection of the value *)
        | Unguarded (* the value result is directly returned *)
        | Guard (* the value is stored under a data constructor *)
        | Delayed (* the value is not needed at definition time *)
        | Unused (* the value is not used at all *)

Before this patchset, the implementation (contributed by Jeremy Yallop)
was formulated as a type-checker manipulating "uses", which are mappings
from free variables to modes

    u ::= (x : m)*

and contexts Γ mapping variables to uses

    Γ ::= (x : u)*

The check relied on a judgment of the form

    Γ ⊢ e : u

where, in the typing algorithm, `Γ` and `e` are the inputs, and `u` is
the output — this could be written `-Γ ⊢ e : +u`.

After this patchset, the implementation uses a simpler notion of context
mapping variables to mods

    Γ ::= (x : m)*

and the judgment has the simpler structure

     Γ ⊢ e : m

but now `m` is an input to the judgment, indicating in which usage
context `e` is checked, and `Γ` is the output: `+Γ ⊢ e : -m`.

Other fixes:
------------

This patchset also fixes a soundness bug around `{ foo with ... }`
which is already present in 4.06 and 4.07.

Another soundness bug found during this work was submitted and fixed
independently as GPR ocaml#1915.

This work also led to the refactoring of GPR ocaml#1922, and the refactoring
of the testsuite in GPR ocaml#1937.
We provide an ocamldoc-style module comment to explain the general
lines of the recursive check. This complements existing comments about
the term- and binding-judgments that are more implementation-oriented.

It was natural to reorder the code so that the "static or dynamic
size" code is apart from the "usage of recursive variables" code.

Finally, I also added copyright headers to reflect the now shared
authorship of the code. (Alban has signed a CLA.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants