Optimise reconstruction of existing blocks in switches (flambda)#8958
Optimise reconstruction of existing blocks in switches (flambda)#8958AltGr wants to merge 9 commits intoocaml:trunkfrom
Conversation
This optimisation detects cases where an immutable matched block is rebuilt with the same structure and arguments, and reuses the existing variable.
|
While interesting, this kind of optimization makes me sad: for one it makes it harder to reason about allocation. But also it goes against the philosophy of OCaml which is to make it easy to write efficient code but not to spend energy trying to make inefficient code efficient. In any case, could you give examples of "untyped" cases of this optimization? |
|
Sorry, I had missed that this is Flambda-only, which makes my comment less relevant... |
|
(Disclaimer: I've participated in the elaboration of this PR)
(* In module option.ml *)
let to_result ~none = function None -> Error none | Some v -> Ok vThe In practice, if you have this optimization you can define your own option types with constructor names that have meaning in your specific contest, and still enjoy a costless conversion to
I find these arguments dubious. Forgetting flambda for a moment, reasoning about allocations in OCaml is complicated for many reasons, including promotion of references to mutable variables, unboxing, tuple binding optimizations, constants lifting, to cite a few. Reusing an existing value instead of allocating, when it is safe to do so, seems like a very natural kind of optimization that fits well with the ones already there. And I had never heard about the philosophy of OCaml that you mention. |
|
It's a philosophy that evolved out of OCaml not having a good optimizer for too long :) Advanced compiler optimization and attaining good performance always carries the cost of less predictability. For examples, see every language (C, C++, Haskell, Java, Rust, F#, etc). Very happy to see this. Keep it up! |
Yes, I think I was a bit rash in my previous judgement... I agree that it does seem like a reasonable optimization. Is there a technical reason to implement this at the Flambda-level or could it have been implemented as an optimization on Lambda? |
|
Another useful case where the optimization is ill-typed at the source level is type ab = [ `A | `B ]
type t = [ ab | `Other ]
type 'a with_metadata = { loc: Location.t; value: 'a }
let assert_ab (x: t with_metadata) : ab with_metadata =
match x.value with
| `Other -> assert false
| #ab as v -> { x with value = v }(Now I wonder if flambda would correctly optimize in this situation, how good is it at reasoning about record-update approximations?) One can regain source-level sharing by writing the pattern-matching differently, but at the cost of extra redundancy in the user code (repeating the pattern fragment between the refined variant and the to-be-shared value): let assert_ab = function
| { value = `Other } -> assert false
| { value = #ab } as v -> v |
The main reasons:
I think a similar pass on Lambda is worth considering, but would not replace this proposal. |
|
By the way, this was not mentioned yet, but the first patch of this PR which propagate more information about switch to lambda is also useful for the next version of flambda. |
Repeating a List.iter twice is simpler than a polymorphism annotation.
Record-update is seen as a regular Imagine the following example: type t1 = { x1 : int; y1 : int; }
type t2 = { x2 : int; y2 : int; z2 : int; }
let f1 t1 =
{ t1 with y1 = t1.y1; }
let f2 t2 =
{ x1 = t2.x2; y1 = t2.y2; }In this example, Do you think we should look at propagating the size of records too ? We've already reviewed the flambda part of the PR, but we would like an independent review for the lambda-related parts. |
|
At a philosophical level, I think this optimisation is great. It is very much like CSE, which people agree is a good thing. (Actually, one could argue that is is CSE.) It frees people from the temptation of writing more obscure code so as to save an allocation here and there. |
|
What is the status of this PR? There seemed to be support for it. |
|
This needs a rebase, and an approval. I can take care of the rebase if @AltGr doesn't have the time, but I would appreciate if someone took the time to review at least the non-flambda parts. |
gasche
left a comment
There was a problem hiding this comment.
I had a first review pass on the first commit, which is "outside flambda". (Are there other such you care about?) Does that help? Do I correctly understand that you are still interested in some parts of this PR, including this one, for Flambda2 ?
|
|
||
| and lambda_switch_block_key = | ||
| { sw_tag : int; | ||
| sw_size : int; |
There was a problem hiding this comment.
What does size mean here: does x :: xs have size 2 (because it has two arguments), or size 3 (because it has a header and two arguments)? Documentation would be nice, and maybe arity would be clearer.
| let desc = { sw_tag = tag; sw_size = size; } in | ||
| (consts, (desc, act) :: nonconsts) | ||
| | Cstr_unboxed -> | ||
| (* The [sw_size] will never make it through to a [Lswitch]. *) |
There was a problem hiding this comment.
I believe the comment could/should say The [desc] will never ....
lambda/simplif.ml
Outdated
| | Lifused (_, lam) -> | ||
| emit_tail_infos is_tail lam | ||
| and list_emit_tail_infos_fun f is_tail = | ||
| (* CR mshinwell: Why doesn't this generalise when eta expanded? *) |
There was a problem hiding this comment.
I'm not sure this comment is meant to stay in the code: either we answer it or we remove it.
lambda/simplif.ml
Outdated
| list_emit_tail_infos_fun snd is_tail sw.sw_consts; | ||
| list_emit_tail_infos_fun snd is_tail sw.sw_blocks; | ||
| list_emit_tail_infos_fun0 snd is_tail sw.sw_consts; | ||
| list_emit_tail_infos_fun1 snd is_tail sw.sw_blocks; |
There was a problem hiding this comment.
urgh, this should be fixed -- maybe you need a polymorphic recursion annotation?
| size; | ||
| } | ||
| in | ||
| tag, describe_constructors idx_const (idx_nonconst+1) rem |
There was a problem hiding this comment.
Minor: I would write more compactly
let tag = idx_nonconst in
let desc = Cstr_block { tag; size } in
desc, describe_constructors ...| | Block tag when tag = num_nonconst -> c | ||
| | Unboxed -> c | ||
| | Block _ | ||
| | Constant _ -> find_constr tag num_const (num_nonconst + 1) rem |
There was a problem hiding this comment.
I liked the previous code better because it was consistent with the constant-constructor case. It's weird to have an equality check in one branch and a pattern-matching in the other.
| size : int; | ||
| } | ||
| | Cstr_unboxed (* Constructor of an unboxed type *) | ||
| | Cstr_extension of Path.t * bool (* Extension constructor |
There was a problem hiding this comment.
I think that the name constructor_tag does not fit anymore if we store size/arity information as well. This leads to contorsions in other modules that want to name "a constructor tag without the size/arity". What about constructor_desc here?
lambda/matching.ml
Outdated
| (cstr.cstr_consts, cstr.cstr_nonconsts, consts, nonconsts) | ||
| with | ||
| | 1, 1, [ (0, act1) ], [ (0, act2) ] -> | ||
| | 1, 1, [(0, act1)], [{ sw_tag = 0; sw_size = _; }, act2] -> |
There was a problem hiding this comment.
I think this would be more readable with { sw_tag = 0; _ }.
| in | ||
| let const, nonconst = split_rec tag_lambda_list in | ||
| sort_int_lambda_list const, | ||
| sort_int_lambda_list nonconst |
There was a problem hiding this comment.
Ideally one would:
- Use a better name than
_simple, for examplesplit_cases_by_descandsplit_cases_by_tag - reduce code duplication by using
List.partitionin this function, the one above and the one below
|
|
||
| exception Constr_not_found | ||
|
|
||
| type constructor_tag_to_find = |
There was a problem hiding this comment.
This could be just constructor_tag if the other one is renamed into constructor_desc. (And it could be included in Types as well, but no strong opinion there.)
Flambda 2 is likely going to settle on a slightly different solution to this problem, which you can see there: oxcaml/oxcaml#703 (it is a generalisation of #10716, which has already been merged in Flambda 2). |
This pull-request (work started by Mark Shinwell) adds a small optimisation that can avoid re-allocating blocks when already existing ones can be reused.
Typically, it would compile code such as
as if you had written
It is a bit involved, as more information needs to be passed on through the lambda and the flambda: the size and mutability of matched blocks need to be known to perform the optimisation. That part (the first 3 patches) touches many files but is straight-forward.
The second part of the patch is purely in flambda (Inline_and_simplify & helpers), and implements the optimisation:
Env(if immutable)EnvIt turned out that the optimisation is triggered much more often than we expected (including in cases where it would not be type sound, and you couldn't manually do it using
as), and it is probably worth including. It is planned to extend it to more cases besidesLswitch, e.g. function arguments.Closes: #7535