-
Notifications
You must be signed in to change notification settings - Fork 1.2k
avoid loading global variables more than necessary #6511
Description
Original bug ID: 6511
Reporter: drichman
Status: acknowledged (set by @xavierleroy on 2014-09-01T13:21:40Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.02.0+beta1 / +rc1
Category: back end (clambda to assembly)
Monitored by: @gasche @diml @yakobowski
Bug description
I believe OCaml 4.02 has (compared to 4.01) improved its ability to detect re-use of a global variable and avoid loading it into a register several times.
However, there are still a few cases where it does not. I've noticed that adding
let some_global = some_global in
to the top of some of my functions produces different (better) assembly and speeds them up; my feature request is that OCaml spot this without my help.
I've put two examples below that I believe produce sub-optimal output with OCaml trunk; the first is a toy example and the latter is the one I care about.
Thanks,
Daniel
Steps to reproduce
Toy example: global refs
This change to the source:
--- global_ref.ml 2014-08-04 09:37:04.634534095 +0100 +++ global_ref2.ml 2014-08-04 09:39:15.978539072 +0100 @@ -2,6 +2,8 @@ let global = int_of_string "1234" let aref = ref 0 let () = + let global = global in + let aref = aref in aref := global; aref := global; aref := global
produces this change in assembly:
--- global_ref.s/ocaml_4.02.0+beta1.s 2014-08-04 09:40:03.963544100 +0100 +++ global_ref2.s/ocaml_4.02.0+beta1.s 2014-08-04 09:40:20.043546093 +0100 @@ -43,15 +43,11 @@ ; Before here, rbx is loaded with the address of camlTest movq $1024, -8(%rax) movq $1, (%rax) ; tagged zero movq %rax, 8(%rbx) ; store initial zero - movq 8(%rbx), %rax ; load addr of aref - movq (%rbx), %rdi ; load global - movq %rdi, (%rax) ; store global - movq 8(%rbx), %rax ; load both _again_ - movq (%rbx), %rdi - movq %rdi, (%rax) ; store - movq 8(%rbx), %rax ; load - movq (%rbx), %rbx ; - movq %rbx, (%rax) ; store + movq (%rbx), %rax ; load global + movq 8(%rbx), %rbx ; load addr of aref + movq %rax, (%rbx) ; store + movq %rax, (%rbx) ; store + movq %rax, (%rbx) ; store movq $1, %rax addq $8, %rsp .cfi_adjust_cfa_offset -8
Example: Core.Iobuf
--- iobuf.ml 2014-08-04 11:15:00.210635072 +0100 +++ iobuf2.ml 2014-08-04 11:15:04.746636077 +0100 @@ -8,6 +8,7 @@ open Bigarray type t = (char, int8_unsigned_elt, c_layout) Array1.t (* After inlining, etc. Iobuf.Unsafe.Poke.int63 boils down to this: *) external int64_of_int : int -> int64 = "%int64_of_int" external unsafe_set_64 : t -> int -> int64 -> unit = "%caml_bigstring_set64u" let unsafe_write_int64_int t ~pos x = unsafe_set_64 t pos (int64_of_int x) let global_buf = Array1.create char c_layout 1024 let () = + let global_buf = global_buf in unsafe_write_int64_int global_buf ~pos:0 123; unsafe_write_int64_int global_buf ~pos:8 456; unsafe_write_int64_int global_buf ~pos:16 789
Adding the "let global_buf = global_buf in" line produces this change in the assembly output:
B: get bigarray;
C: get pointer to buffer in C heap
--- iobuf.s/ocaml_4.02.0+beta1.s 2014-08-04 11:15:19.571638074 +0100 +++ iobuf2.s/ocaml_4.02.0+beta1.s 2014-08-04 11:15:14.667637075 +0100 @@ -70,12 +70,10 @@ call camlBigarray__create_1118@PLT .L102: movq camlTest@GOTPCREL(%rip), %rbx ; A movq %rax, 8(%rbx) ; storing the result of BA_create movq 8(%rbx), %rax ; B - movq 8(%rax), %rax ; C - movq $123, (%rax) ; store - movq 8(%rbx), %rax ; B - movq 8(%rax), %rax ; C - movq $456, 8(%rax) ; store - movq 8(%rbx), %rax ; B - movq 8(%rax), %rax ; C - movq $789, 16(%rax) ; store + movq 8(%rax), %rbx ; C + movq $123, (%rbx) ; store + movq 8(%rax), %rbx ; C + movq $456, 8(%rbx) ; store + movq 8(%rax), %rax ; C + movq $789, 16(%rax) ; store movq $1, %rax
I don't know how feasible it is to eliminate the duplicate "C" loads too.