Original bug ID: 7441
Reporter: markghayden
Status: acknowledged (set by @xavierleroy on 2017-01-14T15:24:19Z)
Resolution: open
Priority: normal
Severity: feature
Platform: AMD
OS: MacOS
OS Version: 10.12.1
Target version: later
Category: middle end (typedtree to clambda)
Duplicate of: #7442
Has duplicate: #7440
Bug description
It appears the Array module is not usable for creating optimal code, even for simple array summation.
let stdlib_sumf v =
Array.fold_left (+.) 0.0 v
;;
This allocates 2 floating points (32 bytes on 64-bit) per iteration.
Experiments were with 4.05 trunk with (-O3 and -unbox-closures). For array summation using Array.fold_left, it appears necessary to hand-create a version of Array.fold_left with typecasts specializing to use with floating point arrays, or some other similar method.
Similarly for summing an array of integers. When using Array.fold_left, allocation doesn't occur, but the assembly code generated for the loop includes checks for the type of the array and includes code (never executed) for allocating a floating point value. Similarly, creating a specialized version of Array.fold_left, removes the checks for type of array.
Steps to reproduce
Use attached file. The output below test case and number of bytes allocated summing array with 10,000 floats. All but the inline2 case allocate 32 bytes (2 floats) per iteration. For integer, review the resulting assembly code.
Output from running program.
make -w -k -j4
make: Entering directory `/Users/mhayden/proj/ocaml/flambda'
/Users/mhayden/.opam/macos.dev/bin/ocamlopt -O3 -unbox-closures -c -S a.ml
/Users/mhayden/.opam/macos.dev/bin/ocamlopt -O3 -unbox-closures -o a a.cmx
./a
stdlib 320096
inline0 320096
inline1 320096
inline2 112
File attachments
Original bug ID: 7441
Reporter: markghayden
Status: acknowledged (set by @xavierleroy on 2017-01-14T15:24:19Z)
Resolution: open
Priority: normal
Severity: feature
Platform: AMD
OS: MacOS
OS Version: 10.12.1
Target version: later
Category: middle end (typedtree to clambda)
Duplicate of: #7442
Has duplicate: #7440
Bug description
It appears the Array module is not usable for creating optimal code, even for simple array summation.
let stdlib_sumf v =
Array.fold_left (+.) 0.0 v
;;
This allocates 2 floating points (32 bytes on 64-bit) per iteration.
Experiments were with 4.05 trunk with (-O3 and -unbox-closures). For array summation using Array.fold_left, it appears necessary to hand-create a version of Array.fold_left with typecasts specializing to use with floating point arrays, or some other similar method.
Similarly for summing an array of integers. When using Array.fold_left, allocation doesn't occur, but the assembly code generated for the loop includes checks for the type of the array and includes code (never executed) for allocating a floating point value. Similarly, creating a specialized version of Array.fold_left, removes the checks for type of array.
Steps to reproduce
Use attached file. The output below test case and number of bytes allocated summing array with 10,000 floats. All but the inline2 case allocate 32 bytes (2 floats) per iteration. For integer, review the resulting assembly code.
Output from running program.
make -w -k -j4
make: Entering directory `/Users/mhayden/proj/ocaml/flambda'
/Users/mhayden/.opam/macos.dev/bin/ocamlopt -O3 -unbox-closures -c -S a.ml
/Users/mhayden/.opam/macos.dev/bin/ocamlopt -O3 -unbox-closures -o a a.cmx
./a
stdlib 320096
inline0 320096
inline1 320096
inline2 112
File attachments