Lazily box float and int references by vbrankov · Pull Request #109 · ocaml/ocaml

vbrankov · 2014-10-24T19:25:28Z

An argument follows that the penalty for early boxing of floats and ints is bigger than lazy boxing. It is supported by a benchmark of a production application.

Cmmgen.transl_unbox_let will unbox a float reference if it only exists in:

Left side of assignments: x := <value>
Its value is read in an operation, because operations accept unboxed values: !x +. 5.

Unboxing a float reference means that the variable will contain the very value of float. This is okay since the variable will never end up in OCaml heap. This gives a significant speedup since assigning it a value will never result in memory allocation. For instance in s := !s +. a *. b, the aggregate cost of allocating and cleaning a float will be considerable compared to the cost of computing the right side.

Unboxing will not occur if the value is read but not in an operation. This is done in language constructs that accept only boxed references. Probably the most frequent situations are:

Using the value as an argument to a call: let y = foo !x in .... Call to "float" native functions are excluded (exp, sin, cos), because they accept unboxed floats.
Assigning the value to a variable: let y = !x in
Returning the value:

let foo () =
  let s = ref 0. in
  ...
  !s

I'm advocating the approach of lazy boxing, which means that float references are always unboxed. For the language constructs which accept boxed floats, the value is boxed in place. This is advantageous to early boxing: more likely is that assignments will be much more frequent than such language constructs. In the unlikely situation where this approach is indeed slower because of frequent boxing, it is much easier to write a workaround than with the current compiler:

let s = ref 0. in
...
let s_value = !s in
while true do
  foo s_value (* instead of !s *)
  ...

I've seen a lot of slow code because the compiler chooses not to unbox a reference due to a single call or assignment. Here's an example illustrating it:

https://github.com/vbrankov/ocaml-demos/blob/master/lazy-boxing/test.ml

A benchmark of a production application shows speed improvement and less garbage:

Name	Time/Run	mWd/Run	mjWd/Run	Prom/Run	Percentage
no patch	5.12s	1.71Gw	2.55Mw	2.32Mw	100.00%
with patch	5.01s	1.61Gw	2.45Mw	2.22Mw	100.00%

…nalty for early boxing is bigger. Late boxing most often occurs when the reference value is used as an argument to a function call, when it is assigned to a record field or when the value is returned. Early boxing most often occurs in assignment, where the speed of boxing and consequently cleaning the garbage can often dwarf the speed of computing the right side of assignment.

Chris00 · 2014-10-25T09:09:08Z

Using the value as an argument to a call: let y = foo !x in .... Call to "float" native functions are excluded (exp, sin, cos), because they accept unboxed floats.

I always wondered if the compiler should not produce two versions of some functions: for, say, let f x = x +. sin(x**2.), shouldn't f be compiled to a version accepting and returning unboxed floats, the boxing occurring only when needed (thus in essence f is like a primitive function with the "float" qualifier).

Assigning the value to a variable: let y = !x in

Shouldn't this be recursive? If y can be unboxed, then one may want to do it for x too.

Returning the value:

This is more tricky. See http://caml.inria.fr/mantis/view.php?id=5133

vbrankov · 2014-10-25T13:19:27Z

The bug 5133 should be fixed with this patch.

alainfrisch · 2014-11-02T09:36:17Z

See also http://caml.inria.fr/mantis/view.php?id=5204 and the more_unboxing branch in OCaml SVN. The idea there was to box lazily, by keeping a reference to the boxed value for future use (before it is re-used, its content is compared to the current unboxed variable). This ensures that (i) there is never more boxing than with the current strategy (ii) there is no boxing if the boxed value is not actually required.

alainfrisch · 2015-06-02T14:56:50Z

let s_value = !s in

Why are references (turned in local mutable variables) treated differently from non-mutable variables in your strategy? It would make sense to apply the same treatment to non-mutable float variables (bound to an expression which yields an unboxed float), no? In which case your work-around above (to force boxing) would not work anymore, but this still seem a better default. One could argue that a non-mutable variable would be boxed at most once in the current strategy, while it could be boxed several times if we delay the boxing, but the same is also the case for mutable variables, and a general way to address that would be the strategy I suggest (i.e. box really lazily, i.e. by caching the boxed value computed on demand).

alainfrisch · 2015-07-10T12:14:46Z

This has been discussed yesterday during a caml-devel meeting, and there is an agreement in principle about switching to the proposed strategy (without my "caching" variant). I'll do some more tests on LexiFi's numerical code to see the impact on performance and if everything goes well, I'll merge the patch.

alainfrisch · 2015-07-17T12:24:07Z

I cannot observe big impacts on LexiFi's tests, performance wise (perhaps partly because we optimized around the previous unboxing scheme; but also because most of the time is spent in actual computation, not boxing). There is a consistent gain in terms of total allocations, which is good anyway. And I trust @vbrankov that on some real applications, the gain in runtime could be noticeable (this can be trivially seen on micro-benchmarks and some real-world code is not so different from such tight loops). Moreover, the change makes the optimizer more predictable and avoids the need/temptation to add "... +. 0.", while simplifying the code of the optimizer itself.

I'll merge the patch.

It could be interesting to refine the new strategy to box only once, as before, when the identifier is not mutated, but the boxed value is used (potentially/certainly) multiple times. I doubt this would lead to significant improvements, though (although again, micro-benchmarks could show the difference).

alainfrisch · 2015-07-17T17:14:27Z

Merged in trunk, commit 16215.

vbrankov · 2015-07-22T16:38:07Z

Why are references (turned in local mutable variables) treated differently from non-mutable variables in your strategy? It would make sense to apply the same treatment to non-mutable float variables (bound to an expression which yields an unboxed float), no? In which case your work-around above (to force boxing) would not work anymore, but this still seem a better default. One could argue that a non-mutable variable would be boxed at most once in the current strategy, while it could be boxed several times if we delay the boxing, but the same is also the case for mutable variables, and a general way to address that would be the strategy I suggest (i.e. box really lazily, i.e. by caching the boxed value computed on demand).

In the current strategy, cmmgen.ml determines that a variable is float only if it is a result of a float operation. In the case of my workaround the variable is a result of dereferencing so the value is boxed. I completely agree with you that the compiler should use more clues to determine whether a variable is float, which would break the workaround. I already discussed it with @lpw25 and @def-lkb to make the compiler smarter in that matter but as far as I know it's still sitting on our to-do lists.

Apologies for a delayed response.

alainfrisch · 2015-07-28T10:57:13Z

@vrbankov : can you have a look at http://caml.inria.fr/mantis/view.php?id=6940 and review the attached patch?

alainfrisch · 2015-10-16T08:34:59Z

This new unboxing strategy, which decides whether to unbox an id or not purely based on the bound expression, actually enables to simplify the whole unboxing scheme, by doing it "on the fly" instead of requiring a post processing part. See http://caml.inria.fr/mantis/view.php?id=7022 and https://github.com/alainfrisch/ocaml/tree/unbox_earlier .

vbrankov · 2015-10-27T19:44:52Z

I'm sorry Alain, I completely missed your email from Jul 28th. I'll review the code right away.

Atomics library

…x .gitignore) (ocaml#109) * Add testsuite/tools/Makefile * Allow testsuite/tools to use Flambda 2 libs

c703f5f Incorporate upstream comments into type-variable refactor (ocaml#121) 362ba23 Constrain curry modes to increase along applications (ocaml#108) b1f0cf9 Simplify the extension handling (ocaml#114) 4fd53a1 Remove pat_mode from typedtree (ocaml#105) cf6fcbc Handle attributes on lambdas with locally abstract types (ocaml#120) 5fa80fe Don't track attributes inside attributes for warning 53 (ocaml#115) 8a69777 Handle unclosed `[: ... :]` patterns (via `Generic_array` machinery) (ocaml#117) b0737f4 Add promote-one Makefile target (ocaml#118) c6ad684 Refactoring and fixes around module lookup (ocaml#107) b0a6495 Add documentation for global constructor arguments (ocaml#69) dd79aec Print `nlocal` in the `-d(raw)lambda` output (ocaml#112) 8035026 Fix `nlocal` in the generated Lambda for list comprehensions (ocaml#113) afbcdf0 Immutable arrays (ocaml#47) bfe1490 fix several issues when removing exp_mode (ocaml#110) 8f46060 Better error message for under-applied functions (ocaml#74) 27331d8 Consistently use Lmutvar or Lvar in comprehensions (ocaml#111) 01e965b Skip failing test for now 0131357 Fix test case to use comprehensions_experimental 22a7368 Temporarily disable list comprehensions tests due to locals bug e08377d Make `comprehensions` into `comprehensions_experimental` for now (ocaml#109) 947cf89 List and array comprehensions (ocaml#46) bd9e051 remove exp_mode from typedtree (ocaml#100) a9268d2 Fix misplaced attribute warning when using external parser (and some cleanup) (ocaml#101) 2b33f24 Refactor toplevel local escape check (ocaml#104) ed2aec6 Comment functions exported from TyVarEnv. 87838ba Move new variable creation into TyVarEnv. a3f60ab Encapsulate functions that work with tyvars 43d83a6 Prevent possibility of forgetting to re-widen 2f3dd34 Encapsulate context when narrowing type env't d78ff6d Make immediate64 things mode cross (ocaml#97) aa25ab9 Fix version number (ocaml#94) d01ffa0 Fix .depend file (ocaml#93) 942f2ab Bootstrap (ocaml#92) 05f7e38 Check Menhir version (ocaml#91) 1569b58 Move the CI jobs from 4.12 to 4.14. (ocaml#90) git-subtree-dir: ocaml git-subtree-split: c703f5f

gasche closed this Jul 18, 2015

stedolan added a commit to stedolan/ocaml that referenced this pull request Jun 9, 2017

Merge pull request ocaml#109 from ocamllabs/atomics

d7e3410

Atomics library

This was referenced Mar 14, 2019

Compilation failure related to unboxing Int64 on 32-bit machines #6940

Closed

OCaml SIGSEGV in invert_pointer_at() (OCaml Garbage Collector / Compaction) #7431

Closed

mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Apr 15, 2020

Fix destroyed_at_oper for Iexit with traps (ocaml#109)

46b3557

lthls added a commit to lthls/ocaml that referenced this pull request Sep 23, 2020

Fix destroyed_at_oper for Iexit with traps (ocaml#109)

e27a687

lthls added a commit to lthls/ocaml that referenced this pull request Sep 23, 2020

Fix destroyed_at_oper for Iexit with traps (ocaml#109)

46a8ab3

lthls added a commit to lthls/ocaml that referenced this pull request Sep 24, 2020

Fix destroyed_at_oper for Iexit with traps (ocaml#109)

3fced95

chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Aug 4, 2021

Add /testsuite/tools/Makefile and allow to use Flambda 2 libs (and fi…

9b5f9d8

…x .gitignore) (ocaml#109) * Add testsuite/tools/Makefile * Allow testsuite/tools to use Flambda 2 libs

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Oct 5, 2021

Add /testsuite/tools/Makefile and allow to use Flambda 2 libs (and fi…

7bf63b8

…x .gitignore) (ocaml#109) * Add testsuite/tools/Makefile * Allow testsuite/tools to use Flambda 2 libs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazily box float and int references#109

Lazily box float and int references#109
vbrankov wants to merge 1 commit intoocaml:trunkfrom
vbrankov:lazy-boxing

vbrankov commented Oct 24, 2014

Uh oh!

Chris00 commented Oct 25, 2014

Uh oh!

vbrankov commented Oct 25, 2014

Uh oh!

alainfrisch commented Nov 2, 2014

Uh oh!

alainfrisch commented Jun 2, 2015

Uh oh!

alainfrisch commented Jul 10, 2015

Uh oh!

alainfrisch commented Jul 17, 2015

Uh oh!

alainfrisch commented Jul 17, 2015

Uh oh!

vbrankov commented Jul 22, 2015

Uh oh!

alainfrisch commented Jul 28, 2015

Uh oh!

alainfrisch commented Oct 16, 2015

Uh oh!

vbrankov commented Oct 27, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vbrankov commented Oct 24, 2014

Uh oh!

Chris00 commented Oct 25, 2014

Uh oh!

vbrankov commented Oct 25, 2014

Uh oh!

alainfrisch commented Nov 2, 2014

Uh oh!

alainfrisch commented Jun 2, 2015

Uh oh!

alainfrisch commented Jul 10, 2015

Uh oh!

alainfrisch commented Jul 17, 2015

Uh oh!

alainfrisch commented Jul 17, 2015

Uh oh!

vbrankov commented Jul 22, 2015

Uh oh!

alainfrisch commented Jul 28, 2015

Uh oh!

alainfrisch commented Oct 16, 2015

Uh oh!

vbrankov commented Oct 27, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants