Skip to content

Dissecting let-recs#8956

Open
AltGr wants to merge 15 commits intoocaml:trunkfrom
AltGr:dissect-letrec
Open

Dissecting let-recs#8956
AltGr wants to merge 15 commits intoocaml:trunkfrom
AltGr:dissect-letrec

Conversation

@AltGr
Copy link
Copy Markdown
Member

@AltGr AltGr commented Sep 20, 2019

This pull-request (started by Pierre Chambart) adds a pass in Lambda
that reorganises let-recs to lift all recursive value definitions. The
resulting lambda contains Lletrec only for recursive function
definitions, and initialisation + assignment sequences for recursive
values. The rewrite is made so that it preserves evaluation order for
side-effects (including for depdendent nested let-recs).

This is expected to be cleaner than the previous handling, and to make
the resulting lambda easier to work with. It's also our hope that it
will more easily be tied with the checks in Rec_check.

The PR contains:

  • a new module lambda/dissect_letrec.ml that does the translation
  • a tiny change to simplif.ml to trigger the rewrite
  • a tiny change in translmod.ml for the class handling case

I am now reasonably convinced that it complies with what Rec_check
considers valid or not, but we probably should go further in
justifying that part; I would very grateful if @gasche could have an
eye at it, which is the main reason for submitting already.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 21, 2019

There is a very nice comment explaining the transform in dissect_let_rec.ml; pasting it below for reference:

(* Converts let-rec containing values into an initialization then
   assignment sequence.
   We assume that the typechecker correctly validated that the letrec
   is compilable (See typing/Rec_check.is_valid_recursive_expression).
   That is, for every expression to which a variable is bound in
   the let-rec verify:
   * It does not inspect any variable bound by the let-rec.
     This implies that the value of a variable bound by the let-rec
     does not need to be valid during the computation of the other
     bindings. Only the address is needed. This means that we can
     preallocate values for which we know the size.
   * If the value can't be preallocated (it can be because the
     size can't be statically known, or because the value is not
     allocated, like integers), then the value does not
     depend on any other rec-bound variables.
     This implies that any value that can't be preallocated can be
     computed before any other binding. Note that this does not mean
     that other variables can't be refered to by the expression,
     but if it happens, then the value is useless.
   We consider two cases for expressions that we consider of
   known size (the [Static] case in [is_valid_recursive_expression]):
   the makeblock primitive and function declarations. Makeblocks will
   be preallocated, while all the functions will be turned into a single
   letrec.
   The structure of the generated code will be:
   {[
     let c = const in
     (* [consts]: constants *)
     ...
     let v = caml_alloc_dummy n in
     (* [blocks]: With n the staticaly known size of blocks *)
     ...
     let p = expr in
     (* [pre]: Values that do not depend on any other from the let-rec in
        [Rec_check.Dereference] position *)
     ...
     let rec f x = ...
     and g x = ...
     and ...
     in
     (* [functions]: All the functions from the let-rec *)
     caml_update_dummy v v_contents;
     (* Initialisation ([effects]) *)
     ...
   ]}
   Special care is taken to handle nested [let rec]s:
   the recursive values of a letrec are often more than the ones
   bound by the letrec expression itself. For instance
   let rec f =
     let rec g x = h (x+1)
     and h x = i x
     in
     fun x -> g x
   and i x =
     if x > 33 then x
     else f x
   in this expression every function is recursively defined with
   any other. Hence this is equivalent to
   let rec f = fun x -> g x
   and i x =
     if x > 33 then x
     else f x
   and g x = h (x+1)
   and h x = i x
   However, there might be (a subset of) a local [let rec] that is indeed
   internally recursive, but that is used by the top-level [let rec] in
   [Dereference] positions. We carefully lift those within [pre], and handle
   them recursively in the same pass to preserve the order of evaluation.
   The analysis for which variables should remain in the inner [let rec], and
   which are indeed part of the outer [let rec] is equivalent to the
   [Rec_check.value_bindings] case.
*)

I have looked at nothing else in this file yet.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 21, 2019

I think that this PR does two different (related) things:

  1. it explicitates the backpatching semantics (initialize->evaluate->set) at the Lambda level
  2. it tries harder to create larger immediate-functions-only blocks (which use the clever closure representation strategy), in fact it two different ways:
    a. within a value-letrec block, isolating an immediate-functions-only sub-block
    b. in let rec f = (let rec ... in ...) and ... scenarios, merging some of the inner-definitions into the outer-definitions to try to create larger closure blocks

From my current understanding (not having looked at the code yet) my judgment would be as follows:

  • (1) sounds like an excellent idea to me (currently the duplication of logics between bytecode and native is painful)
  • I'm less sold on (2). I think it adds a lot of complexity to the code, and the benefits are not clear to me (I haven't seen a discussion of them). More precisely, my impression is that (2.a) is reasonable because it is a purely local transformation, but (2.b) is a harder sell because it is in charge of moving things around, while still preserving evaluation order.

There is also the issue that recursive closures may worsen the lifetime of some captured values. This is rarely an issue, and expert users know about it, but from that perspective merging recursive-closures around does carry a part of risk.

I feel embarrassed by this first assessment because I haven't looked at the details yet, and I realize that (2.b) was a lot of work to get right, but there we go...

@AltGr
Copy link
Copy Markdown
Member Author

AltGr commented Sep 23, 2019

More examples for anyone interested: AltGr@d311da3

@chambart
Copy link
Copy Markdown
Contributor

@gasche The main reason for (2) is to avoid fixing the representation of closures at the lambda level. The generated lambda doesn't ever need to patch recursively defined closures. To reduce the lifetime of captured values, it would be possible to split that let-rec into smaller ones through a topological sort. It was not deemed important enough to warrant for the complexity.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Sep 26, 2019

Is there a particular reason that this code is triggered in simplif.ml rather than just called directly from translcore.ml? Doing it from translcore.ml would allow us to ban value letrecs at the lambda level entirely, perhaps adjusting the representation of Lletrec accordingly to only support functions. It also seems like that would make any future connection between the Rec_check code and the Dissect_let_rec code slightly easier.

@mshinwell
Copy link
Copy Markdown
Contributor

Are potential producers of Lambda code other than the frontend of the OCaml compiler going to be upset if we made such a change?

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Sep 26, 2019

I doubt anyone is relying on the ability of lambda code to support value recursion, but if they were they could just use Dissect_letrec themselves in place of their existing letrecs.

@chambart
Copy link
Copy Markdown
Contributor

I'm not sure that the transformation is doable in translcore. The rec-check allows nested let-recs that I think can't be compiled out of context. This means that we need to build them top to bottom, I think this is not really doable in translcore where we don't have the whole expression.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Sep 30, 2019

I suspect that the nested ones will still work. I think that the inner let-recs need to be "dissectable" on their own to pass the rec-check. Then the outer let-recs just look at the results of "dissecting" the inner ones when "dissecting" themselves -- rather than looking at original inner let-recs as they do now.

I would expect that to basically involve adding cases for caml_alloc_dummy and caml_alloc_dummy_float to Dissect_letrec.prepare_letrec. That would probably be best done by making those C primitives into normal primitives -- but that would be an improvement anyway.

@AltGr
Copy link
Copy Markdown
Member Author

AltGr commented Oct 7, 2019

One issue that would make handling this bottom-up difficult, I think, is that different levels of let-recs can be interleaved. More specifically, an inner let-rec could mix mutually recursive members, and independent members that are recursive with an outer let-rec.

E.g. a good rewrite should rewrite the following

let rec f () = g ()
and g () = f ()

and

let rec f =
  let rec h () = g () in h
and g () = f ()

in the same way. Untangling top-down allows us to do that.

Now, if we need to go bottom-up, I am afraid it would be very difficult to extract the remaining inner let-recs of functions and place them back at the outer level (if needed only, i.e. if mutually recursive). If we fail to do that, the inner let-rec could still contain calls to functions from the outer let-rec, thus cannot be lifted in an alloc + update phase.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 9, 2019

I don't quite see why:

let rec f =
  let rec h () = g () in h
and g () = f ()

would be difficult bottom-up. Wouldn't running dissection on:

let rec h () = g () in h

leave it unchanged, and then the code would behave the same as it does now?

@AltGr
Copy link
Copy Markdown
Member Author

AltGr commented Oct 15, 2019

Sorry if that wasn't a perfect example. The issue is, I don't think it's true that the lambda still respects the rec-check once it has been rewritten, making it more difficult for the outer letrec to merge the expected functions with the inner letrec.

While it may seem OK at first, it is possible to make this fail on more convoluted examples:

type t = T0 | T1 of t | Tf of (unit -> t)

let rec a = T0
and b =
  let rec f () = T1 a
  and x = Tf f
  in x

The key here is that we are using a value recursive with the outer let-rec to define a function in an inner let-rec.

The proposed top-down rewrite gives the following lambda:

(let (a/84 = 0a x/87 = (caml_alloc_dummy 1))
  (letrec (f/86 (function param/88 (makeblock 0 a/84)))
    (seq (seq (caml_update_dummy x/87 (makeblock 1 f/86)) 0a)
      (makeblock 0 a/84 x/87))))

where the let-recs are flattened and handled in a single pass (where the aliasing of b and x is detected).

Now, a bottom-up rewrite would first transform the inner let-rec into:

(let (x/87 = (caml_alloc_dummy 1))
  (letrec (f/86 (function param/88 (makeblock 0 a/84)))
    (seq (caml_update_dummy x/87 (makeblock 1 f/86)) x/87)))

A let-rec of functions remains here, but entangled between the allocation and updates. When rewriting the outer let-rec, we will then pull the allocation of x in thepre, and the functions remaining in the let-rec will be merged with the outer let-rec of functions (since there is mutual recursion).

But where to put the effects ? Putting them in pre, before the functions, as is normally done, would be wrong, since the effect refer to it:

(let (a/84 = 0a x/87 = (caml_alloc_dummy 1))
  (seq (caml_update_dummy x/87 (makeblock 1 f/86))
                                            ^^^^ referred to before being defined
    (letrec (f/86 (function param/88 (makeblock 0 a/84)))
      (makeblock 0 a/84 x/87))))

But even having a special case that would push the caml_update_dummy afterwards would be wrong in general: we just need to add a non-recursive element in the inner let-rec, and dereference it before returning the value for b:

let rec a = T0
and b =
  let rec f () = T1 a
  and x = Tf f
  and y = ref "foo"
  in !y, x

Now the rewritten inner let-rec has

(seq (caml_update_dummy x/87 (makeblock 1 f/86))
     (caml_update_dummy y/88 (makemutable 0 "foo")))

When trying to define the outer let-rec, we would need to put the second caml_update_dummy before the function definitions, and the first one, after. I don't think this would be possible to handle correctly without resorting to a topological sort at this point; this would complicate things a lot, and probably affect the evaluation order.

While I certainly agree that it would be nice to be able to encode this simplification directly in the lambda type, as far as making the connection with the rec-check is concerned, I think the proposed approach makes things easier since both are top-down.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 17, 2019

Ok, IIUC you are basically saying that the following is not valid by the let-rec check:

type t = T0 | T1 of t | Tf of (unit -> t)
type s = { mutable t : (unit -> t) option }

let rec a = T0
  and b =
    let xr = { t = None } in
    let rec f () = T1 a in
    xr.t <- Some f;
    xr

and the algorithm implemented here could not handle such things in general. Since this is essentially the form of the output of let-rec dissection, that dissection cannot be applied bottom-up.

This makes sense. However, note that the above code obeys both:

It does not inspect any variable bound by the let-rec.

and:

If the value can't be preallocated (it can be because the
size can't be statically known, or because the value is not
allocated, like integers), then the value does not
depend on any other rec-bound variables.

So I think that means there is a missing condition in the comment which was copied out above. It's probably important to make sure we get all of those conditions right because they the interface between this code and the let-rec check.

@AltGr
Copy link
Copy Markdown
Member Author

AltGr commented Oct 17, 2019

Ah, I see your point indeed. "Dereference" (or here "inspect") is to be used in a broader sense than usual here, as defined by the rec-check: we understand <- as dereferencing both sides in this context, which leads, in the analysis, b to transitively dereference a in your example.

We should probably be more careful in the comment explaining the pre-condition.

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Nov 28, 2019

We're relying on this (to some extent) in our flambda2 work, and we would like to see some independent review of the code.
@gasche do you think you could find some time (or some other person) to look at this code ?
It is quite complex, but the main reason why it is so complex is that the simpler approaches we tried turned out to be wrong on some corner cases (these corner cases are added as tests by this PR).
Some of this complexity could probably be removed by restricting the let-rec check further, if you feel like it.

@gasche
Copy link
Copy Markdown
Member

gasche commented Dec 1, 2019

I would like to find some time to review this, but I failed to do so since the PR was posted. The most efficient way to advance would be to meet in person to discuss it. Can you grab people interested in doing so, and send me an email to organize this?

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I met @lthls, @AltGr and @chambart today and we started going through the code, this is my first batch of review comments.

...
let p = expr in
(* [pre]: Values that do not depend on any other from the let-rec in
[Rec_check.Dereference] position *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Rec_check-accepted definitions should satisfy the property that no recursive binding Dereference-depends on any other, so it is not clear to me what this condition enforces.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the neighbor functions are not bound yet (they are the f, g below), so in fact the expr can only reference (non-function) value neighbors, and cannot reference their function neighbors.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AltGr and @chambart say that all (most of?) the user-visible side-effects of the letrec-bindings are moved to be in this group, and the v_contents mentioned below in the comment are in fact not doing observable effects (their effectful parts have been split/lifted into the [pre] block); so maybe the ([effects]) comment below is wrong/confusing.

(* [consts]: constants *)
...
let v = caml_alloc_dummy n in
(* [blocks]: With n the staticaly known size of blocks *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those must never receive functions (the v_contents cannot be functions) as the starting premise of the work is to avoid having an instruction to patch function values. One way to think of it is that functions do not have a static size. (The current code does consider that functions values have a known static size, by counting free variables, but this is not robust to different closure-representation or optimization approaches.)

let rec g x = h (x+1)
and h x = i x
in
fun x -> g x
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we want to lift g and h out as toplevel bindings, so that f can be part of the letrec of functions (we don't want to backpatch functions anymore). The definition of f might dereference g or h, but a nice property holds: if f dereferences a local binding, then this local binding cannot in turn refer (mention at any mode) one of the toplevel neighbors of f, otherwise f would dereference that neighbor transitively and the definition would invalid. So:

  • either f dereferences the local let, which can be lifted in the pre (it does not refer to function neighbors)
  • or f does not dereference it and it is valid to include it in the letrec nest (if it is a function)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is why we need to lift local-lets at the toplevel (the part of the transformation which intuitively I don't like so much), to expose that f is really a function.

However, there might be (a subset of) a local [let rec] that is indeed
internally recursive, but that is used by the top-level [let rec] in
[Dereference] positions. We carefully lift those within [pre], and handle
them recursively in the same pass to preserve the order of evaluation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it possible to preserve the evaluation order? The generated code shape is

     let c = const in
     (* [consts]: constants *)
     ...
     let v = caml_alloc_dummy n in
     (* [blocks]: With n the staticaly known size of blocks *)
     ...
     let p = expr in
     (* [pre]: Values that do not depend on any other from the let-rec in
        [Rec_check.Dereference] position *)
     ...
     let rec f x = ...
     and g x = ...
     and ...
     in
     (* [functions]: All the functions from the let-rec *)
     caml_update_dummy v v_contents;
     (* Backpatching *)
     ...

The recursive-function block does not perform any observable effect. The effects must happen in either [pre] or [v_contents], the trick is to make all effects happen in [pre] to preserve the evaluation order. Intuitively, this should be possible (@chambart claims) because all effect operations impose mode Dereference on their arguments, so they cannot really refer to the recursive functions. (They can be inside expressions that do, but the effectful subexpressions can be lifeted in [pre] in an order-preserving way. They claim.)


type dissected =
| Dissected of Lambda.lambda
| Unchanged
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Unchanged information matters because we are doing rewriting from the root to the leaves. If the input is exactly a letrec of functions, we return Unchanged, and then the caller knows that it has to recursively rewrite the subdefinitions. If Dissected is returned, you call yourself on the transformation result. If we turned Unchanged into Dissected <input>, there would be an infinite transformation loop.

match current_let with
| Some cl -> build_block cl (List.length args) Boxed_float lam letrec
| None -> dead_code lam letrec
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two cases would be simpler written as:

| Lprim ((Pmakeblock _ | Pmakearray _) as prim, args, ) ->
  begin match current_lent with
  | None -> ...
  | Some cl -> build_block ... (match prim with ...) ...
  end

assert false
| Types.Record_float ->
build_block cl size Boxed_float arg letrec)
| None -> dead_code lam letrec
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case could be first

else
let recursive_set = Ident.Set.add id recursive_set in
let letrec = prepare_letrec recursive_set current_let body letrec in
let let_def = { let_kind; value_kind; ident = id } in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let_def => current_let

recursive variable *)
let pre ~tail : Lambda.lambda =
Llet (current_let.let_kind, current_let.value_kind,
current_let.ident, lam, letrec.pre ~tail)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: with this definition, the call that runs last performs its effects first. This explain the slightly-odd implementation of the Lsequence(lam1, lam2) for example, that looks essentially like

let letrec = dissect lam2 in
let letrec = dissect lam1 in
letrec

(state-passing-style, in reverse order)

If we had written instead

let pre ~tail =
  letrec.pre (Llet (..., ~tail))

then we could write the calls in the order of evaluation, instead of the reverse order.

(It might be nice to try this to see if it's a big change and/or if it really makes the code nicer.)


One solution would be to handle now the outer bindings, and lift the
inner ones into [letrec.pre], to be handled in a second pass. That
would change the evaluation order, though, so we instead descend
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me why this changes the evaluation order. Generally the rest of the code assumes that any code at the root of the current lam argument can be safely (evaluation-order-preservingly) moved into pre, and here it seems to not be an issue.

Maybe the problem comes from splitting these inner-bindings in two groups (one for pre and one to add to the current workset), which imposes an evaluation order between the two groups that is not the source order?

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 23, 2020

Between the first and second round of review, @lthls found a bug in the interaction between dissection (as in this PR) and the Return usage allowed by the recursive check (rec-check). We discussed what's a good solution for this.

We decided that the best approach is to change rec-check to forbid certain patterns, instead of making dissection more complex or rec-compilation less efficient in corner cases. We have a good idea of how to restrict the rec-check in this way, but this requires a bit of work, before which the current PR should not be merged (it is unsound with respect to the current check). This means that the current PR should be considered paused, as we work on the rec-check.

For the record the bug has the following reproduction case (currently works well, but fails if dissected with this PR):

let rec f =
  let g = if true then f else f in
  fun x -> g x

Here f is a static function, but the dissection would try to lift g in the [pre] set, which is wrong as it depends on f. For simpler forms of local definitions, recursive dissection and the alias mechanism of the PR would work, but this one (which coincides with the notion of "Dynamic"-size expression) makes it fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants