Conversation
…nalty for early boxing is bigger. Late boxing most often occurs when the reference value is used as an argument to a function call, when it is assigned to a record field or when the value is returned. Early boxing most often occurs in assignment, where the speed of boxing and consequently cleaning the garbage can often dwarf the speed of computing the right side of assignment.
I always wondered if the compiler should not produce two versions of some functions: for, say,
Shouldn't this be recursive? If
This is more tricky. See http://caml.inria.fr/mantis/view.php?id=5133 |
|
The bug 5133 should be fixed with this patch. |
|
See also http://caml.inria.fr/mantis/view.php?id=5204 and the more_unboxing branch in OCaml SVN. The idea there was to box lazily, by keeping a reference to the boxed value for future use (before it is re-used, its content is compared to the current unboxed variable). This ensures that (i) there is never more boxing than with the current strategy (ii) there is no boxing if the boxed value is not actually required. |
Why are references (turned in local mutable variables) treated differently from non-mutable variables in your strategy? It would make sense to apply the same treatment to non-mutable float variables (bound to an expression which yields an unboxed float), no? In which case your work-around above (to force boxing) would not work anymore, but this still seem a better default. One could argue that a non-mutable variable would be boxed at most once in the current strategy, while it could be boxed several times if we delay the boxing, but the same is also the case for mutable variables, and a general way to address that would be the strategy I suggest (i.e. box really lazily, i.e. by caching the boxed value computed on demand). |
|
This has been discussed yesterday during a caml-devel meeting, and there is an agreement in principle about switching to the proposed strategy (without my "caching" variant). I'll do some more tests on LexiFi's numerical code to see the impact on performance and if everything goes well, I'll merge the patch. |
|
I cannot observe big impacts on LexiFi's tests, performance wise (perhaps partly because we optimized around the previous unboxing scheme; but also because most of the time is spent in actual computation, not boxing). There is a consistent gain in terms of total allocations, which is good anyway. And I trust @vbrankov that on some real applications, the gain in runtime could be noticeable (this can be trivially seen on micro-benchmarks and some real-world code is not so different from such tight loops). Moreover, the change makes the optimizer more predictable and avoids the need/temptation to add "... +. 0.", while simplifying the code of the optimizer itself. I'll merge the patch. It could be interesting to refine the new strategy to box only once, as before, when the identifier is not mutated, but the boxed value is used (potentially/certainly) multiple times. I doubt this would lead to significant improvements, though (although again, micro-benchmarks could show the difference). |
|
Merged in trunk, commit 16215. |
In the current strategy, cmmgen.ml determines that a variable is float only if it is a result of a float operation. In the case of my workaround the variable is a result of dereferencing so the value is boxed. I completely agree with you that the compiler should use more clues to determine whether a variable is float, which would break the workaround. I already discussed it with @lpw25 and @def-lkb to make the compiler smarter in that matter but as far as I know it's still sitting on our to-do lists. Apologies for a delayed response. |
|
@vrbankov : can you have a look at http://caml.inria.fr/mantis/view.php?id=6940 and review the attached patch? |
|
This new unboxing strategy, which decides whether to unbox an id or not purely based on the bound expression, actually enables to simplify the whole unboxing scheme, by doing it "on the fly" instead of requiring a post processing part. See http://caml.inria.fr/mantis/view.php?id=7022 and https://github.com/alainfrisch/ocaml/tree/unbox_earlier . |
|
I'm sorry Alain, I completely missed your email from Jul 28th. I'll review the code right away. |
Atomics library
…x .gitignore) (ocaml#109) * Add testsuite/tools/Makefile * Allow testsuite/tools to use Flambda 2 libs
…x .gitignore) (ocaml#109) * Add testsuite/tools/Makefile * Allow testsuite/tools to use Flambda 2 libs
c703f5f Incorporate upstream comments into type-variable refactor (ocaml#121) 362ba23 Constrain curry modes to increase along applications (ocaml#108) b1f0cf9 Simplify the extension handling (ocaml#114) 4fd53a1 Remove pat_mode from typedtree (ocaml#105) cf6fcbc Handle attributes on lambdas with locally abstract types (ocaml#120) 5fa80fe Don't track attributes inside attributes for warning 53 (ocaml#115) 8a69777 Handle unclosed `[: ... :]` patterns (via `Generic_array` machinery) (ocaml#117) b0737f4 Add promote-one Makefile target (ocaml#118) c6ad684 Refactoring and fixes around module lookup (ocaml#107) b0a6495 Add documentation for global constructor arguments (ocaml#69) dd79aec Print `nlocal` in the `-d(raw)lambda` output (ocaml#112) 8035026 Fix `nlocal` in the generated Lambda for list comprehensions (ocaml#113) afbcdf0 Immutable arrays (ocaml#47) bfe1490 fix several issues when removing exp_mode (ocaml#110) 8f46060 Better error message for under-applied functions (ocaml#74) 27331d8 Consistently use Lmutvar or Lvar in comprehensions (ocaml#111) 01e965b Skip failing test for now 0131357 Fix test case to use comprehensions_experimental 22a7368 Temporarily disable list comprehensions tests due to locals bug e08377d Make `comprehensions` into `comprehensions_experimental` for now (ocaml#109) 947cf89 List and array comprehensions (ocaml#46) bd9e051 remove exp_mode from typedtree (ocaml#100) a9268d2 Fix misplaced attribute warning when using external parser (and some cleanup) (ocaml#101) 2b33f24 Refactor toplevel local escape check (ocaml#104) ed2aec6 Comment functions exported from TyVarEnv. 87838ba Move new variable creation into TyVarEnv. a3f60ab Encapsulate functions that work with tyvars 43d83a6 Prevent possibility of forgetting to re-widen 2f3dd34 Encapsulate context when narrowing type env't d78ff6d Make immediate64 things mode cross (ocaml#97) aa25ab9 Fix version number (ocaml#94) d01ffa0 Fix .depend file (ocaml#93) 942f2ab Bootstrap (ocaml#92) 05f7e38 Check Menhir version (ocaml#91) 1569b58 Move the CI jobs from 4.12 to 4.14. (ocaml#90) git-subtree-dir: ocaml git-subtree-split: c703f5f
An argument follows that the penalty for early boxing of floats and ints is bigger than lazy boxing. It is supported by a benchmark of a production application.
Cmmgen.transl_unbox_letwill unbox a float reference if it only exists in:x := <value>!x +. 5.Unboxing a float reference means that the variable will contain the very value of float. This is okay since the variable will never end up in OCaml heap. This gives a significant speedup since assigning it a value will never result in memory allocation. For instance in
s := !s +. a *. b, the aggregate cost of allocating and cleaning a float will be considerable compared to the cost of computing the right side.Unboxing will not occur if the value is read but not in an operation. This is done in language constructs that accept only boxed references. Probably the most frequent situations are:
let y = foo !x in .... Call to "float" native functions are excluded (exp, sin, cos), because they accept unboxed floats.let y = !x inI'm advocating the approach of lazy boxing, which means that float references are always unboxed. For the language constructs which accept boxed floats, the value is boxed in place. This is advantageous to early boxing: more likely is that assignments will be much more frequent than such language constructs. In the unlikely situation where this approach is indeed slower because of frequent boxing, it is much easier to write a workaround than with the current compiler:
I've seen a lot of slow code because the compiler chooses not to unbox a reference due to a single call or assignment. Here's an example illustrating it:
A benchmark of a production application shows speed improvement and less garbage: