Optimise reconstruction of existing blocks in switches (flambda) by AltGr · Pull Request #8958 · ocaml/ocaml

AltGr · 2019-09-20T15:01:52Z

This pull-request (work started by Mark Shinwell) adds a small optimisation that can avoid re-allocating blocks when already existing ones can be reused.

Typically, it would compile code such as

let f = function
  | Some x -> Some x
  | ...

as if you had written

let f = function
  | Some x as a -> a
  | ...

It is a bit involved, as more information needs to be passed on through the lambda and the flambda: the size and mutability of matched blocks need to be known to perform the optimisation. That part (the first 3 patches) touches many files but is straight-forward.

The second part of the patch is purely in flambda (Inline_and_simplify & helpers), and implements the optimisation:

matched blocks are recorded in the flambda Env (if immutable)
the approximation is extended to record when a value is the result of the projection of a block recorded in the Env
upon building a block, using the above we check if it matches a recorded block, and replace it if possible

It turned out that the optimisation is triggered much more often than we expected (including in cases where it would not be type sound, and you couldn't manually do it using as), and it is probably worth including. It is planned to extend it to more cases besides Lswitch, e.g. function arguments.

Closes: #7535

This optimisation detects cases where an immutable matched block is rebuilt with the same structure and arguments, and reuses the existing variable.

nojb · 2019-09-20T15:12:59Z

While interesting, this kind of optimization makes me sad: for one it makes it harder to reason about allocation. But also it goes against the philosophy of OCaml which is to make it easy to write efficient code but not to spend energy trying to make inefficient code efficient.

In any case, could you give examples of "untyped" cases of this optimization?

nojb · 2019-09-20T15:18:28Z

Sorry, I had missed that this is Flambda-only, which makes my comment less relevant...

lthls · 2019-09-20T16:06:07Z

(Disclaimer: I've participated in the elaboration of this PR)

In any case, could you give examples of "untyped" cases of this optimization?

(* In module option.ml *)
let to_result ~none = function None -> Error none | Some v -> Ok v

The Ok v allocation can reuse the Some v block, because at the point were this optimization is done we know they're equivalent.
Most of the unexpected occurrences of this optimization are similar cases where the constructor being matched on and one of the values being constructed happen to share the same tag and contains the same value(s), despite being of different types.

In practice, if you have this optimization you can define your own option types with constructor names that have meaning in your specific contest, and still enjoy a costless conversion to option for communication with libs that return or expect that. It might also help a few people remove Obj.magics, although there isn't any way to assert that the optimization is triggered at the moment.

While interesting, this kind of optimization makes me sad: for one it makes it harder to reason about allocation. But also it goes against the philosophy of OCaml which is to make it easy to write efficient code but not to spend energy trying to make inefficient code efficient.

I find these arguments dubious. Forgetting flambda for a moment, reasoning about allocations in OCaml is complicated for many reasons, including promotion of references to mutable variables, unboxing, tuple binding optimizations, constants lifting, to cite a few. Reusing an existing value instead of allocating, when it is safe to do so, seems like a very natural kind of optimization that fits well with the ones already there. And I had never heard about the philosophy of OCaml that you mention.

bluddy · 2019-09-20T16:17:53Z

It's a philosophy that evolved out of OCaml not having a good optimizer for too long :) Advanced compiler optimization and attaining good performance always carries the cost of less predictability. For examples, see every language (C, C++, Haskell, Java, Rust, F#, etc).

Very happy to see this. Keep it up!

nojb · 2019-09-20T16:34:11Z

Reusing an existing value instead of allocating, when it is safe to do so, seems like a very natural kind of optimization that fits well with the ones already there

Yes, I think I was a bit rash in my previous judgement... I agree that it does seem like a reasonable optimization.

Is there a technical reason to implement this at the Flambda-level or could it have been implemented as an optimization on Lambda?

gasche · 2019-09-21T08:31:57Z

Another useful case where the optimization is ill-typed at the source level is

type ab = [ `A | `B ]
type t = [ ab | `Other ]
type 'a with_metadata = { loc: Location.t; value: 'a }

let assert_ab (x: t with_metadata) : ab with_metadata =
  match x.value with
  | `Other -> assert false
  | #ab as v -> { x with value = v }

(Now I wonder if flambda would correctly optimize in this situation, how good is it at reasoning about record-update approximations?)

One can regain source-level sharing by writing the pattern-matching differently, but at the cost of extra redundancy in the user code (repeating the pattern fragment between the refined variant and the to-be-shared value):

let assert_ab = function
  | { value = `Other } -> assert false
  | { value = #ab } as v -> v

lthls · 2019-09-23T13:48:52Z

Is there a technical reason to implement this at the Flambda-level or could it have been implemented as an optimization on Lambda?

The main reasons:

Doing it on flambda means we can combine it with inlining. Sometimes the optimization can be triggered after inlining, and flambda's inlining heuristics can now take this optimization into account.
Flambda is well suited for writing new optimization passes. Information computed by other passes can be reused freely, terms are somewhat simplified, and there's a cost/benefit analysis that is easy to hook into and can be relied upon for heuristics.

I think a similar pass on Lambda is worth considering, but would not replace this proposal.

chambart · 2019-09-23T15:03:50Z

By the way, this was not mentioned yet, but the first patch of this PR which propagate more information about switch to lambda is also useful for the next version of flambda.

lambda/simplif.ml

Repeating a List.iter twice is simpler than a polymorphism annotation.

lthls · 2019-10-11T13:31:39Z

(Now I wonder if flambda would correctly optimize in this situation, how good is it at reasoning about record-update approximations?)

Record-update is seen as a regular Pmakeblock by the time it reaches flambda (except for very large records). The problem in your case is that flambda does not have the size of the input argument, so it cannot know that it can reuse the existing block.

Imagine the following example:

type t1 = { x1 : int; y1 : int; }
type t2 = { x2 : int; y2 : int; z2 : int; }

let f1 t1 =
  { t1 with y1 = t1.y1; }

let f2 t2 =
  { x1 = t2.x2; y1 = t2.y2; }

In this example, f1 and f2 have (almost) the same lambda code but only f1 can reuse its argument.

Do you think we should look at propagating the size of records too ?
Would you consider a similar simplification on Lambda a prerequisite to get this merged ?
Do you want benchmarks ? Micro-benchmarks showing huge improvements are trivial to generate, but I don't know how much of a difference it makes on large codebases (we're not expecting a big impact).

We've already reviewed the flambda part of the PR, but we would like an independent review for the lambda-related parts.

fpottier · 2020-04-16T14:58:04Z

At a philosophical level, I think this optimisation is great. It is very much like CSE, which people agree is a good thing. (Actually, one could argue that is is CSE.) It frees people from the temptation of writing more obscure code so as to save an allocation here and there.

nojb · 2020-06-08T06:05:11Z

What is the status of this PR? There seemed to be support for it.

lthls · 2020-06-09T08:38:55Z

This needs a rebase, and an approval. I can take care of the rebase if @AltGr doesn't have the time, but I would appreciate if someone took the time to review at least the non-flambda parts.

gasche

I had a first review pass on the first commit, which is "outside flambda". (Are there other such you care about?) Does that help? Do I correctly understand that you are still interested in some parts of this PR, including this one, for Flambda2 ?

gasche · 2022-07-01T12:33:51Z

lambda/lambda.ml


+and lambda_switch_block_key =
+  { sw_tag : int;
+    sw_size : int;


What does size mean here: does x :: xs have size 2 (because it has two arguments), or size 3 (because it has a header and two arguments)? Documentation would be nice, and maybe arity would be clearer.

gasche · 2022-07-01T12:34:58Z

lambda/matching.ml

+          let desc = { sw_tag = tag; sw_size = size; } in
+          (consts, (desc, act) :: nonconsts)
+        | Cstr_unboxed ->
+          (* The [sw_size] will never make it through to a [Lswitch]. *)


I believe the comment could/should say The [desc] will never ....

gasche · 2022-07-01T12:41:39Z

lambda/simplif.ml

  | Lifused (_, lam) ->
      emit_tail_infos is_tail lam
-and list_emit_tail_infos_fun f is_tail =
+(* CR mshinwell: Why doesn't this generalise when eta expanded? *)


I'm not sure this comment is meant to stay in the code: either we answer it or we remove it.

gasche · 2022-07-01T12:42:26Z

lambda/simplif.ml

-      list_emit_tail_infos_fun snd is_tail sw.sw_consts;
-      list_emit_tail_infos_fun snd is_tail sw.sw_blocks;
+      list_emit_tail_infos_fun0 snd is_tail sw.sw_consts;
+      list_emit_tail_infos_fun1 snd is_tail sw.sw_blocks;


urgh, this should be fixed -- maybe you need a polymorphic recursion annotation?

gasche · 2022-07-01T12:44:42Z

typing/datarepr.ml

+                size;
+              }
+            in
+            tag, describe_constructors idx_const (idx_nonconst+1) rem


Minor: I would write more compactly

let tag = idx_nonconst in let desc = Cstr_block { tag; size } in desc, describe_constructors ...

gasche · 2022-07-01T12:46:35Z

typing/datarepr.ml

+      | Block tag when tag = num_nonconst -> c
+      | Unboxed -> c
+      | Block _
+      | Constant _ -> find_constr tag num_const (num_nonconst + 1) rem


I liked the previous code better because it was consistent with the constant-constructor case. It's weird to have an equality check in one branch and a pattern-matching in the other.

gasche · 2022-07-01T12:48:43Z

typing/types.ml

+      size : int;
+    }
  | Cstr_unboxed                        (* Constructor of an unboxed type *)
  | Cstr_extension of Path.t * bool     (* Extension constructor


I think that the name constructor_tag does not fit anymore if we store size/arity information as well. This leads to contorsions in other modules that want to name "a constructor tag without the size/arity". What about constructor_desc here?

gasche · 2022-07-01T12:49:12Z

lambda/matching.ml

              (cstr.cstr_consts, cstr.cstr_nonconsts, consts, nonconsts)
            with
-            | 1, 1, [ (0, act1) ], [ (0, act2) ] ->
+          | 1, 1, [(0, act1)], [{ sw_tag = 0; sw_size = _; }, act2] ->


I think this would be more readable with { sw_tag = 0; _ }.

gasche · 2022-07-01T12:52:10Z

lambda/matching.ml

+  in
+  let const, nonconst = split_rec tag_lambda_list in
+  sort_int_lambda_list const,
+  sort_int_lambda_list nonconst


Ideally one would:

Use a better name than _simple, for example split_cases_by_desc and split_cases_by_tag

reduce code duplication by using List.partition in this function, the one above and the one below

gasche · 2022-07-01T12:54:29Z

typing/datarepr.mli


 exception Constr_not_found

+type constructor_tag_to_find =


This could be just constructor_tag if the other one is renamed into constructor_desc. (And it could be included in Types as well, but no strong opinion there.)

lthls · 2022-07-01T15:01:22Z

Do I correctly understand that you are still interested in some parts of this PR, including this one, for Flambda2 ?

Flambda 2 is likely going to settle on a slightly different solution to this problem, which you can see there: oxcaml/oxcaml#703 (it is a generalisation of #10716, which has already been merged in Flambda 2).
However, this patch may still prove useful in some cases, and we're not going to merge Flambda 2 anytime soon anyway.
I'll see if Louis is still willing to work on this PR, if not I'll try to find somebody else or work on it myself.

mshinwell and others added 6 commits September 20, 2019 16:38

Annotate Lswitch with the sizes of blocks

f8e0a94

Further annotate Switch with block sizes in flambda

69ad622

Propagate info on mutability in switches through lambda, flambda

900a11b

Optimise reconstruction of existing blocks in switches [flambda]

ab2ccb3

This optimisation detects cases where an immutable matched block is rebuilt with the same structure and arguments, and reuses the existing variable.

Fix incorrect use of 'Map.find_first_opt'

8823942

remove_alloc rather than remove_prim

bf3b938

chambart reviewed Sep 23, 2019

View reviewed changes

lambda/simplif.ml Outdated Show resolved Hide resolved

AltGr added 3 commits September 23, 2019 17:16

Update Changes and copyright

08eb373

Cleanup emit_tail_infos

44e004b

Repeating a List.iter twice is simpler than a polymorphism annotation.

Merge branch 'trunk' into switch-no-rebuild

30e7772

nojb mentioned this pull request May 8, 2020

Improve sharing of values #7535

Closed

leo-boitel mentioned this pull request Jun 11, 2021

Propagate more information on blocks #10452

Open

leo-boitel added a commit to leo-boitel/ocaml that referenced this pull request Jun 18, 2021

Merging ocaml#8958 with the field_info propagation to flambda

7b08025

lthls mentioned this pull request Nov 4, 2021

Proper Patricia tree map function oxcaml/oxcaml#370

Merged

gasche reviewed Jul 1, 2022

View reviewed changes

lthls mentioned this pull request Aug 30, 2022

Runtime size vs. type-derived size #11523

Closed

gasche mentioned this pull request Nov 23, 2023

Stdlib: Add NonEmpty data structure for non-empty lists #12771

Closed

jberdine mentioned this pull request Aug 6, 2024

Potential Record with Syntax Optimization #12464

Open

Conversation

AltGr commented Sep 20, 2019 • edited by nojb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nojb commented Sep 20, 2019

Uh oh!

nojb commented Sep 20, 2019

Uh oh!

lthls commented Sep 20, 2019

Uh oh!

bluddy commented Sep 20, 2019

Uh oh!

nojb commented Sep 20, 2019

Uh oh!

gasche commented Sep 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lthls commented Sep 23, 2019

Uh oh!

chambart commented Sep 23, 2019

Uh oh!

Uh oh!

lthls commented Oct 11, 2019

Uh oh!

fpottier commented Apr 16, 2020

Uh oh!

nojb commented Jun 8, 2020

Uh oh!

lthls commented Jun 9, 2020

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lthls commented Jul 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

AltGr commented Sep 20, 2019 •

edited by nojb

Loading

gasche commented Sep 21, 2019 •

edited

Loading