Skip to content

Lazily box float and int references#109

Closed
vbrankov wants to merge 1 commit intoocaml:trunkfrom
vbrankov:lazy-boxing
Closed

Lazily box float and int references#109
vbrankov wants to merge 1 commit intoocaml:trunkfrom
vbrankov:lazy-boxing

Conversation

@vbrankov
Copy link
Copy Markdown

An argument follows that the penalty for early boxing of floats and ints is bigger than lazy boxing. It is supported by a benchmark of a production application.

Cmmgen.transl_unbox_let will unbox a float reference if it only exists in:

  1. Left side of assignments: x := <value>
  2. Its value is read in an operation, because operations accept unboxed values: !x +. 5.

Unboxing a float reference means that the variable will contain the very value of float. This is okay since the variable will never end up in OCaml heap. This gives a significant speedup since assigning it a value will never result in memory allocation. For instance in s := !s +. a *. b, the aggregate cost of allocating and cleaning a float will be considerable compared to the cost of computing the right side.

Unboxing will not occur if the value is read but not in an operation. This is done in language constructs that accept only boxed references. Probably the most frequent situations are:

  1. Using the value as an argument to a call: let y = foo !x in .... Call to "float" native functions are excluded (exp, sin, cos), because they accept unboxed floats.
  2. Assigning the value to a variable: let y = !x in
  3. Returning the value:
let foo () =
  let s = ref 0. in
  ...
  !s

I'm advocating the approach of lazy boxing, which means that float references are always unboxed. For the language constructs which accept boxed floats, the value is boxed in place. This is advantageous to early boxing: more likely is that assignments will be much more frequent than such language constructs. In the unlikely situation where this approach is indeed slower because of frequent boxing, it is much easier to write a workaround than with the current compiler:

let s = ref 0. in
...
let s_value = !s in
while true do
  foo s_value (* instead of !s *)
  ...

I've seen a lot of slow code because the compiler chooses not to unbox a reference due to a single call or assignment. Here's an example illustrating it:

https://github.com/vbrankov/ocaml-demos/blob/master/lazy-boxing/test.ml

A benchmark of a production application shows speed improvement and less garbage:

Name Time/Run mWd/Run mjWd/Run Prom/Run Percentage
no patch 5.12s 1.71Gw 2.55Mw 2.32Mw 100.00%
with patch 5.01s 1.61Gw 2.45Mw 2.22Mw 100.00%

…nalty for early boxing is bigger. Late boxing most often occurs when the reference value is used as an argument to a function call, when it is assigned to a record field or when the value is returned. Early boxing most often occurs in assignment, where the speed of boxing and consequently cleaning the garbage can often dwarf the speed of computing the right side of assignment.
@Chris00
Copy link
Copy Markdown
Member

Chris00 commented Oct 25, 2014

Using the value as an argument to a call: let y = foo !x in .... Call to "float" native functions are excluded (exp, sin, cos), because they accept unboxed floats.

I always wondered if the compiler should not produce two versions of some functions: for, say, let f x = x +. sin(x**2.), shouldn't f be compiled to a version accepting and returning unboxed floats, the boxing occurring only when needed (thus in essence f is like a primitive function with the "float" qualifier).

Assigning the value to a variable: let y = !x in

Shouldn't this be recursive? If y can be unboxed, then one may want to do it for x too.

Returning the value:

This is more tricky. See http://caml.inria.fr/mantis/view.php?id=5133

@vbrankov
Copy link
Copy Markdown
Author

The bug 5133 should be fixed with this patch.

@alainfrisch
Copy link
Copy Markdown
Contributor

See also http://caml.inria.fr/mantis/view.php?id=5204 and the more_unboxing branch in OCaml SVN. The idea there was to box lazily, by keeping a reference to the boxed value for future use (before it is re-used, its content is compared to the current unboxed variable). This ensures that (i) there is never more boxing than with the current strategy (ii) there is no boxing if the boxed value is not actually required.

@alainfrisch
Copy link
Copy Markdown
Contributor

let s_value = !s in

Why are references (turned in local mutable variables) treated differently from non-mutable variables in your strategy? It would make sense to apply the same treatment to non-mutable float variables (bound to an expression which yields an unboxed float), no? In which case your work-around above (to force boxing) would not work anymore, but this still seem a better default. One could argue that a non-mutable variable would be boxed at most once in the current strategy, while it could be boxed several times if we delay the boxing, but the same is also the case for mutable variables, and a general way to address that would be the strategy I suggest (i.e. box really lazily, i.e. by caching the boxed value computed on demand).

@alainfrisch
Copy link
Copy Markdown
Contributor

This has been discussed yesterday during a caml-devel meeting, and there is an agreement in principle about switching to the proposed strategy (without my "caching" variant). I'll do some more tests on LexiFi's numerical code to see the impact on performance and if everything goes well, I'll merge the patch.

@alainfrisch
Copy link
Copy Markdown
Contributor

I cannot observe big impacts on LexiFi's tests, performance wise (perhaps partly because we optimized around the previous unboxing scheme; but also because most of the time is spent in actual computation, not boxing). There is a consistent gain in terms of total allocations, which is good anyway. And I trust @vbrankov that on some real applications, the gain in runtime could be noticeable (this can be trivially seen on micro-benchmarks and some real-world code is not so different from such tight loops). Moreover, the change makes the optimizer more predictable and avoids the need/temptation to add "... +. 0.", while simplifying the code of the optimizer itself.

I'll merge the patch.

It could be interesting to refine the new strategy to box only once, as before, when the identifier is not mutated, but the boxed value is used (potentially/certainly) multiple times. I doubt this would lead to significant improvements, though (although again, micro-benchmarks could show the difference).

@alainfrisch
Copy link
Copy Markdown
Contributor

Merged in trunk, commit 16215.

@gasche gasche closed this Jul 18, 2015
@vbrankov
Copy link
Copy Markdown
Author

Why are references (turned in local mutable variables) treated differently from non-mutable variables in your strategy? It would make sense to apply the same treatment to non-mutable float variables (bound to an expression which yields an unboxed float), no? In which case your work-around above (to force boxing) would not work anymore, but this still seem a better default. One could argue that a non-mutable variable would be boxed at most once in the current strategy, while it could be boxed several times if we delay the boxing, but the same is also the case for mutable variables, and a general way to address that would be the strategy I suggest (i.e. box really lazily, i.e. by caching the boxed value computed on demand).

In the current strategy, cmmgen.ml determines that a variable is float only if it is a result of a float operation. In the case of my workaround the variable is a result of dereferencing so the value is boxed. I completely agree with you that the compiler should use more clues to determine whether a variable is float, which would break the workaround. I already discussed it with @lpw25 and @def-lkb to make the compiler smarter in that matter but as far as I know it's still sitting on our to-do lists.

Apologies for a delayed response.

@alainfrisch
Copy link
Copy Markdown
Contributor

@vrbankov : can you have a look at http://caml.inria.fr/mantis/view.php?id=6940 and review the attached patch?

@alainfrisch
Copy link
Copy Markdown
Contributor

This new unboxing strategy, which decides whether to unbox an id or not purely based on the bound expression, actually enables to simplify the whole unboxing scheme, by doing it "on the fly" instead of requiring a post processing part. See http://caml.inria.fr/mantis/view.php?id=7022 and https://github.com/alainfrisch/ocaml/tree/unbox_earlier .

@vbrankov
Copy link
Copy Markdown
Author

I'm sorry Alain, I completely missed your email from Jul 28th. I'll review the code right away.

stedolan added a commit to stedolan/ocaml that referenced this pull request Jun 9, 2017
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Apr 15, 2020
lthls added a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls added a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls added a commit to lthls/ocaml that referenced this pull request Sep 24, 2020
chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Aug 4, 2021
…x .gitignore) (ocaml#109)

* Add testsuite/tools/Makefile

* Allow testsuite/tools to use Flambda 2 libs
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Oct 5, 2021
…x .gitignore) (ocaml#109)

* Add testsuite/tools/Makefile

* Allow testsuite/tools to use Flambda 2 libs
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
c703f5f Incorporate upstream comments into type-variable refactor (ocaml#121)
362ba23 Constrain curry modes to increase along applications (ocaml#108)
b1f0cf9 Simplify the extension handling (ocaml#114)
4fd53a1 Remove pat_mode from typedtree (ocaml#105)
cf6fcbc Handle attributes on lambdas with locally abstract types (ocaml#120)
5fa80fe Don't track attributes inside attributes for warning 53 (ocaml#115)
8a69777 Handle unclosed `[: ... :]` patterns (via `Generic_array` machinery) (ocaml#117)
b0737f4 Add promote-one Makefile target (ocaml#118)
c6ad684 Refactoring and fixes around module lookup (ocaml#107)
b0a6495 Add documentation for global constructor arguments (ocaml#69)
dd79aec Print `nlocal` in the `-d(raw)lambda` output (ocaml#112)
8035026 Fix `nlocal` in the generated Lambda for list comprehensions (ocaml#113)
afbcdf0 Immutable arrays (ocaml#47)
bfe1490 fix several issues when removing exp_mode (ocaml#110)
8f46060 Better error message for under-applied functions (ocaml#74)
27331d8 Consistently use Lmutvar or Lvar in comprehensions (ocaml#111)
01e965b Skip failing test for now
0131357 Fix test case to use comprehensions_experimental
22a7368 Temporarily disable list comprehensions tests due to locals bug
e08377d Make `comprehensions` into `comprehensions_experimental` for now (ocaml#109)
947cf89 List and array comprehensions (ocaml#46)
bd9e051 remove exp_mode from typedtree (ocaml#100)
a9268d2 Fix misplaced attribute warning when using external parser (and some cleanup) (ocaml#101)
2b33f24 Refactor toplevel local escape check (ocaml#104)
ed2aec6 Comment functions exported from TyVarEnv.
87838ba Move new variable creation into TyVarEnv.
a3f60ab Encapsulate functions that work with tyvars
43d83a6 Prevent possibility of forgetting to re-widen
2f3dd34 Encapsulate context when narrowing type env't
d78ff6d Make immediate64 things mode cross (ocaml#97)
aa25ab9 Fix version number (ocaml#94)
d01ffa0 Fix .depend file (ocaml#93)
942f2ab Bootstrap (ocaml#92)
05f7e38 Check Menhir version (ocaml#91)
1569b58 Move the CI jobs from 4.12 to 4.14. (ocaml#90)

git-subtree-dir: ocaml
git-subtree-split: c703f5f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants