Skip to content

Optimise reconstruction of existing blocks in switches (flambda)#8958

Open
AltGr wants to merge 9 commits intoocaml:trunkfrom
AltGr:switch-no-rebuild
Open

Optimise reconstruction of existing blocks in switches (flambda)#8958
AltGr wants to merge 9 commits intoocaml:trunkfrom
AltGr:switch-no-rebuild

Conversation

@AltGr
Copy link
Copy Markdown
Member

@AltGr AltGr commented Sep 20, 2019

This pull-request (work started by Mark Shinwell) adds a small optimisation that can avoid re-allocating blocks when already existing ones can be reused.

Typically, it would compile code such as

let f = function
  | Some x -> Some x
  | ...

as if you had written

let f = function
  | Some x as a -> a
  | ...

It is a bit involved, as more information needs to be passed on through the lambda and the flambda: the size and mutability of matched blocks need to be known to perform the optimisation. That part (the first 3 patches) touches many files but is straight-forward.

The second part of the patch is purely in flambda (Inline_and_simplify & helpers), and implements the optimisation:

  • matched blocks are recorded in the flambda Env (if immutable)
  • the approximation is extended to record when a value is the result of the projection of a block recorded in the Env
  • upon building a block, using the above we check if it matches a recorded block, and replace it if possible

It turned out that the optimisation is triggered much more often than we expected (including in cases where it would not be type sound, and you couldn't manually do it using as), and it is probably worth including. It is planned to extend it to more cases besides Lswitch, e.g. function arguments.

Closes: #7535

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Sep 20, 2019

While interesting, this kind of optimization makes me sad: for one it makes it harder to reason about allocation. But also it goes against the philosophy of OCaml which is to make it easy to write efficient code but not to spend energy trying to make inefficient code efficient.

In any case, could you give examples of "untyped" cases of this optimization?

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Sep 20, 2019

Sorry, I had missed that this is Flambda-only, which makes my comment less relevant...

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Sep 20, 2019

(Disclaimer: I've participated in the elaboration of this PR)

In any case, could you give examples of "untyped" cases of this optimization?

(* In module option.ml *)
let to_result ~none = function None -> Error none | Some v -> Ok v

The Ok v allocation can reuse the Some v block, because at the point were this optimization is done we know they're equivalent.
Most of the unexpected occurrences of this optimization are similar cases where the constructor being matched on and one of the values being constructed happen to share the same tag and contains the same value(s), despite being of different types.

In practice, if you have this optimization you can define your own option types with constructor names that have meaning in your specific contest, and still enjoy a costless conversion to option for communication with libs that return or expect that. It might also help a few people remove Obj.magics, although there isn't any way to assert that the optimization is triggered at the moment.

While interesting, this kind of optimization makes me sad: for one it makes it harder to reason about allocation. But also it goes against the philosophy of OCaml which is to make it easy to write efficient code but not to spend energy trying to make inefficient code efficient.

I find these arguments dubious. Forgetting flambda for a moment, reasoning about allocations in OCaml is complicated for many reasons, including promotion of references to mutable variables, unboxing, tuple binding optimizations, constants lifting, to cite a few. Reusing an existing value instead of allocating, when it is safe to do so, seems like a very natural kind of optimization that fits well with the ones already there. And I had never heard about the philosophy of OCaml that you mention.

@bluddy
Copy link
Copy Markdown
Contributor

bluddy commented Sep 20, 2019

It's a philosophy that evolved out of OCaml not having a good optimizer for too long :) Advanced compiler optimization and attaining good performance always carries the cost of less predictability. For examples, see every language (C, C++, Haskell, Java, Rust, F#, etc).

Very happy to see this. Keep it up!

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Sep 20, 2019

Reusing an existing value instead of allocating, when it is safe to do so, seems like a very natural kind of optimization that fits well with the ones already there

Yes, I think I was a bit rash in my previous judgement... I agree that it does seem like a reasonable optimization.

Is there a technical reason to implement this at the Flambda-level or could it have been implemented as an optimization on Lambda?

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 21, 2019

Another useful case where the optimization is ill-typed at the source level is

type ab = [ `A | `B ]
type t = [ ab | `Other ]
type 'a with_metadata = { loc: Location.t; value: 'a }

let assert_ab (x: t with_metadata) : ab with_metadata =
  match x.value with
  | `Other -> assert false
  | #ab as v -> { x with value = v }

(Now I wonder if flambda would correctly optimize in this situation, how good is it at reasoning about record-update approximations?)

One can regain source-level sharing by writing the pattern-matching differently, but at the cost of extra redundancy in the user code (repeating the pattern fragment between the refined variant and the to-be-shared value):

let assert_ab = function
  | { value = `Other } -> assert false
  | { value = #ab } as v -> v

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Sep 23, 2019

Is there a technical reason to implement this at the Flambda-level or could it have been implemented as an optimization on Lambda?

The main reasons:

  • Doing it on flambda means we can combine it with inlining. Sometimes the optimization can be triggered after inlining, and flambda's inlining heuristics can now take this optimization into account.
  • Flambda is well suited for writing new optimization passes. Information computed by other passes can be reused freely, terms are somewhat simplified, and there's a cost/benefit analysis that is easy to hook into and can be relied upon for heuristics.

I think a similar pass on Lambda is worth considering, but would not replace this proposal.

@chambart
Copy link
Copy Markdown
Contributor

By the way, this was not mentioned yet, but the first patch of this PR which propagate more information about switch to lambda is also useful for the next version of flambda.

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Oct 11, 2019

(Now I wonder if flambda would correctly optimize in this situation, how good is it at reasoning about record-update approximations?)

Record-update is seen as a regular Pmakeblock by the time it reaches flambda (except for very large records). The problem in your case is that flambda does not have the size of the input argument, so it cannot know that it can reuse the existing block.

Imagine the following example:

type t1 = { x1 : int; y1 : int; }
type t2 = { x2 : int; y2 : int; z2 : int; }

let f1 t1 =
  { t1 with y1 = t1.y1; }

let f2 t2 =
  { x1 = t2.x2; y1 = t2.y2; }

In this example, f1 and f2 have (almost) the same lambda code but only f1 can reuse its argument.

Do you think we should look at propagating the size of records too ?
Would you consider a similar simplification on Lambda a prerequisite to get this merged ?
Do you want benchmarks ? Micro-benchmarks showing huge improvements are trivial to generate, but I don't know how much of a difference it makes on large codebases (we're not expecting a big impact).

We've already reviewed the flambda part of the PR, but we would like an independent review for the lambda-related parts.

@fpottier
Copy link
Copy Markdown
Contributor

At a philosophical level, I think this optimisation is great. It is very much like CSE, which people agree is a good thing. (Actually, one could argue that is is CSE.) It frees people from the temptation of writing more obscure code so as to save an allocation here and there.

@nojb nojb mentioned this pull request May 8, 2020
@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jun 8, 2020

What is the status of this PR? There seemed to be support for it.

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Jun 9, 2020

This needs a rebase, and an approval. I can take care of the rebase if @AltGr doesn't have the time, but I would appreciate if someone took the time to review at least the non-flambda parts.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a first review pass on the first commit, which is "outside flambda". (Are there other such you care about?) Does that help? Do I correctly understand that you are still interested in some parts of this PR, including this one, for Flambda2 ?


and lambda_switch_block_key =
{ sw_tag : int;
sw_size : int;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does size mean here: does x :: xs have size 2 (because it has two arguments), or size 3 (because it has a header and two arguments)? Documentation would be nice, and maybe arity would be clearer.

let desc = { sw_tag = tag; sw_size = size; } in
(consts, (desc, act) :: nonconsts)
| Cstr_unboxed ->
(* The [sw_size] will never make it through to a [Lswitch]. *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the comment could/should say The [desc] will never ....

| Lifused (_, lam) ->
emit_tail_infos is_tail lam
and list_emit_tail_infos_fun f is_tail =
(* CR mshinwell: Why doesn't this generalise when eta expanded? *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this comment is meant to stay in the code: either we answer it or we remove it.

list_emit_tail_infos_fun snd is_tail sw.sw_consts;
list_emit_tail_infos_fun snd is_tail sw.sw_blocks;
list_emit_tail_infos_fun0 snd is_tail sw.sw_consts;
list_emit_tail_infos_fun1 snd is_tail sw.sw_blocks;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urgh, this should be fixed -- maybe you need a polymorphic recursion annotation?

size;
}
in
tag, describe_constructors idx_const (idx_nonconst+1) rem
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I would write more compactly

let tag = idx_nonconst in
let desc = Cstr_block { tag; size } in
desc, describe_constructors ...

| Block tag when tag = num_nonconst -> c
| Unboxed -> c
| Block _
| Constant _ -> find_constr tag num_const (num_nonconst + 1) rem
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the previous code better because it was consistent with the constant-constructor case. It's weird to have an equality check in one branch and a pattern-matching in the other.

size : int;
}
| Cstr_unboxed (* Constructor of an unboxed type *)
| Cstr_extension of Path.t * bool (* Extension constructor
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the name constructor_tag does not fit anymore if we store size/arity information as well. This leads to contorsions in other modules that want to name "a constructor tag without the size/arity". What about constructor_desc here?

(cstr.cstr_consts, cstr.cstr_nonconsts, consts, nonconsts)
with
| 1, 1, [ (0, act1) ], [ (0, act2) ] ->
| 1, 1, [(0, act1)], [{ sw_tag = 0; sw_size = _; }, act2] ->
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be more readable with { sw_tag = 0; _ }.

in
let const, nonconst = split_rec tag_lambda_list in
sort_int_lambda_list const,
sort_int_lambda_list nonconst
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally one would:

  1. Use a better name than _simple, for example split_cases_by_desc and split_cases_by_tag
  2. reduce code duplication by using List.partition in this function, the one above and the one below


exception Constr_not_found

type constructor_tag_to_find =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just constructor_tag if the other one is renamed into constructor_desc. (And it could be included in Types as well, but no strong opinion there.)

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Jul 1, 2022

Do I correctly understand that you are still interested in some parts of this PR, including this one, for Flambda2 ?

Flambda 2 is likely going to settle on a slightly different solution to this problem, which you can see there: oxcaml/oxcaml#703 (it is a generalisation of #10716, which has already been merged in Flambda 2).
However, this patch may still prove useful in some cases, and we're not going to merge Flambda 2 anytime soon anyway.
I'll see if Louis is still willing to work on this PR, if not I'll try to find somebody else or work on it myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve sharing of values

8 participants