Decide unboxing of let-bound expressions based on their Cmm translation#2165
Decide unboxing of let-bound expressions based on their Cmm translation#2165lthls merged 4 commits intoocaml:trunkfrom
Conversation
|
@kayceesrk : would you be as kind as to add this PR to the benchmarking on http://ocamllabs.io/multicore/? |
bac4151 to
87c36b6
Compare
|
This is no criticism of @alainfrisch, but I find it mildly inappropriate to use @kayceesrk's time for non-multicore-related benchmarks because the compiler team uses no easily-deployed compiler benchmark tool. Is there an easy way to use a clone of the ocamllabs.io/multicore system (or some other, but maintained by someone) that gives what-to-benchmark control to compiler developers, so that @alainfrisch could directly start the measurements he was requesting above? |
|
Mandatory stupid micro-benchmark: let foo () =
let r = ref 0. in
let f x =
r := !r +. x;
if !r >= 1000. then r:= !r -. 1000.
in
for i = 1 to 100000 do
match i with
| 0 -> f 0.
| 1 -> f 1.
| n ->
let a, b =
if n > 100 then float n, float (n + 1)
else float (n + 1), 42.
in
if n > 1000 then f (a *. b)
done;
!r
let () =
for i = 1 to 1000 do
ignore (foo ())
done;
With this PR, the main function |
|
@chambart @mshinwell do you think you could give a look to this PR? |
|
@gasche The flambda benchmarks could be used instead (its infrastructure was actually built upon and improved for the multicore benchmarks), and allowing compiler developers to add/remove benchmarks would be as easy as giving them commit/merge rights to OCamlPro/ocamlbench-repo. Alternatively, anybody can submit a pull request to add a branch to test (there are several PRs that have been merged doing just this, for inspiration). Although benchmarking the official compiler against proposed changes is already the purpose of the flambda benchmarks server, if someone wants to host another benchmarking server more clearly managed by the official OCaml team, we would be willing to help with setting up the infrastructure too. |
|
@lthls personally I would be very happy to use the flambda benchmarks. Where is the short documentation on how to add a new ocaml/ocaml PR there, with pointers to those pull requests you mention for inspiration? ("I would be happy to use" is more or less a figure of speech here: I don't personally submit many optimizations-related PRs, but others could be interested if the bar of entry wasn't too high. Clearly @alainfrisch doesn't think of your tool as something he could use easily, which suggests that something is slightly off right now. Thanks for replying here, which is a step to fix this!) |
|
Maybe we could add this documentation (on how to submit a PR to the flambda benchmarks) to our CONTRIBUTING.md, or at least add a pointer to some documentation somewhere. |
|
I'm working on a pull request to add some information to CONTRIBUTING.md, but it looks like there are a few issues at the moment with our scripts (the version of trunk being benchmarked points to a commit from July at the moment, which is not intended), so I'll wait until we've fixed these issues before publishing the PR. |
|
@chambart @mshinwell : do you think you might be able to review this PR? |
lthls
left a comment
There was a problem hiding this comment.
I noticed a few things that could be improved:
- There is a slight difference in the handling of
raise: it used to be classified asNo_returninis_unboxed_number, whereas now it falls into the defaultNo_unboxingcase ofis_unboxed_number_cmm. I don't know how much of a difference it would make in practice, but it should be easy to make a special case forCop(Craise)anyway.
Here is a micro-benchmark showing a difference:
let improbable () = Sys.opaque_identity false
let id x = x [@@inline]
let () =
let x0 = Gc.minor_words () in
let r = ref 0. in
for i = 0 to 99999 do
r := id (if improbable () then raise Exit else !r +. 1.)
done;
let x1 = Gc.minor_words () in
Printf.printf "Minor words: %f\n%!" (x1 -. x0)- There is a small issue with assuming the body of a
trywithblock to be in tail position, but this is a matter of properly documenting the behaviour, not a bug in the code (see inline comment).
I didn't notice any other problem with the patch, but I haven't reviewed it in depth.
Thanks. I've fixed that by extending the two new |
|
@alainfrisch Could we please hold off merging this until the debugging changes to |
|
Looking quickly at changes to Cmm/Cmmgen in #2294, there should be no (serious) conflict with this one. I'm not strongly opposed to waiting a bit more for this PR, but it gives an immediate improvement and it has been ready for four months, so I'm not sure that adding it to the list of things blocked by #2294 is the best approach, one should rather try to decouple issues and merge them when they are ready. |
|
@alainfrisch You're probably not looking at the revised version of the code for #2294 though, which I think is going to conflict. I'm hoping it won't take too long to get this new version of #2294 reviewed and merged, and I'm certainly happy to help fix the conflicts in this one after that. (Edit: so we can make sure this gets in for 4.09 too.) |
|
I've rewritten the history to have just two meaningful and standalone commits for the two parts. Do you really prefer two PRs? |
|
I think so -- that's our standard practice, no? These features I hope are orthogonal to each other... I'm ready to start reviewing pretty soon, by the way. |
The second commit is based on the first one, though. I've kept only the first part in the PR, and move thed second part in #8735 (which also contains commits from this PR). |
|
Gentle ping to @mshinwell or whoever else (@lthls ?) interested in reviewing this PR and its companion #8735. |
lthls
left a comment
There was a problem hiding this comment.
Globally nice changes.
In addition to a few specific remarks, I'll note that this work makes it easier to separate unboxing from translation. I don't think it's worth pushing it further and make unboxing a separate pass at the moment, but it's a good thing in my opinion.
(I had this review done a while ago, but I forgot I didn't submit it. Sorry for the delay.)
|
Thanks @lthls for your positive review! I've integrated your comments (last commit). If you are ok, can you merge? |
|
I'm going to merge this soon. On the kind of merge: I tend to prefer squash and merge, but I've seen that regular merges are quite common. Is there a global policy or are we free to choose based on personal taste ? |
|
Thanks! I'm not aware of a global policy but I also usually prefer squash-and-merge, and for this specific case, this seems to be right choice (there is only one "real" commit). |
|
It's important for me that there is something that looks like a "merge commit" in the result, notably that contains the number of the Github PR for later reference (merge commits from github always have that). I think that squash-and-merge is fine in that regard -- rebase-and-merge is the one that should be avoided. |
|
Squashed and merged, then. Thanks @alainfrisch for the PR. |
|
Great, thanks to you for the review! If you still have some spare cycles, #8735 is waiting for you 😄 It is a spin-off from this PR (see the benchmark above). |
Previously, for a Clambda let-binding
let x = e1 in e2, Cmmgen would look at the e1 Clambda expression to decide whether it is worth/possible to unbox the binding. This requires to "predict" at the Clambda level if an expression produces a boxed number, while this is really a property of the Cmm code. The analysis tried to replicate the behavior of the Clamdba->Cmm translation, but missed some cases such aszin:(It turns out that, recursively,
aandbwill be unboxed, so accessing them will require a re-boxing, but when analyzing thelet z, we still don't know that.)The PR proposes to translate the bound Clambda expression to Cmm and analyze the result to decide if the binding should be unboxed. This avoids a duplication of logic and fixes the code above.
Side-note: it is a bit unsatisfactory to reverse engineer boxing (but this was already the case before), and one might want to discuss keeping a symbolic representations of the boxing/unboxing operations in Cmm and lowering them later (as a simple Cmm->Cmm rewriting pass, or during the selection pass).