Skip to content

avoid loading global variables more than necessary #6511

@vicuna

Description

@vicuna

Original bug ID: 6511
Reporter: drichman
Status: acknowledged (set by @xavierleroy on 2014-09-01T13:21:40Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.02.0+beta1 / +rc1
Category: back end (clambda to assembly)
Monitored by: @gasche @diml @yakobowski

Bug description

I believe OCaml 4.02 has (compared to 4.01) improved its ability to detect re-use of a global variable and avoid loading it into a register several times.

However, there are still a few cases where it does not. I've noticed that adding

let some_global = some_global in

to the top of some of my functions produces different (better) assembly and speeds them up; my feature request is that OCaml spot this without my help.

I've put two examples below that I believe produce sub-optimal output with OCaml trunk; the first is a toy example and the latter is the one I care about.

Thanks,
Daniel

Steps to reproduce

Toy example: global refs

This change to the source:

--- global_ref.ml	2014-08-04 09:37:04.634534095 +0100
+++ global_ref2.ml	2014-08-04 09:39:15.978539072 +0100
@@ -2,6 +2,8 @@
 let global = int_of_string "1234"
 let aref = ref 0
 
 let () =
+  let global = global in
+  let aref = aref in
   aref := global;
   aref := global;
   aref := global

produces this change in assembly:

--- global_ref.s/ocaml_4.02.0+beta1.s	2014-08-04 09:40:03.963544100 +0100
+++ global_ref2.s/ocaml_4.02.0+beta1.s	2014-08-04 09:40:20.043546093 +0100
@@ -43,15 +43,11 @@

  ; Before here, rbx is loaded with the address of camlTest
 	movq	$1024, -8(%rax)
 	movq	$1, (%rax)              ; tagged zero
 	movq	%rax, 8(%rbx)           ; store initial zero
-	movq	8(%rbx), %rax           ; load addr of aref
-	movq	(%rbx), %rdi            ; load global
-	movq	%rdi, (%rax)            ; store global
-	movq	8(%rbx), %rax           ; load both _again_
-	movq	(%rbx), %rdi
-	movq	%rdi, (%rax)            ; store
-	movq	8(%rbx), %rax           ; load
-	movq	(%rbx), %rbx            ; 
-	movq	%rbx, (%rax)            ; store
+	movq	(%rbx), %rax            ; load global
+	movq	8(%rbx), %rbx           ; load addr of aref
+	movq	%rax, (%rbx)            ; store
+	movq	%rax, (%rbx)            ; store
+	movq	%rax, (%rbx)            ; store
 	movq	$1, %rax
 	addq	$8, %rsp
 	.cfi_adjust_cfa_offset	-8

Example: Core.Iobuf

--- iobuf.ml	2014-08-04 11:15:00.210635072 +0100
+++ iobuf2.ml	2014-08-04 11:15:04.746636077 +0100
@@ -8,6 +8,7 @@
 open Bigarray

 type t = (char, int8_unsigned_elt, c_layout) Array1.t

 (* After inlining, etc. Iobuf.Unsafe.Poke.int63 boils down to this: *)
 external int64_of_int : int -> int64 = "%int64_of_int"
 external unsafe_set_64 : t -> int -> int64 -> unit = "%caml_bigstring_set64u"
 let unsafe_write_int64_int t ~pos x = unsafe_set_64 t pos (int64_of_int x)

 let global_buf = Array1.create char c_layout 1024
 
 let () =
+  let global_buf = global_buf in
   unsafe_write_int64_int global_buf ~pos:0  123;
   unsafe_write_int64_int global_buf ~pos:8  456;
   unsafe_write_int64_int global_buf ~pos:16 789

Adding the "let global_buf = global_buf in" line produces this change in the assembly output:

B: get bigarray;
C: get pointer to buffer in C heap

--- iobuf.s/ocaml_4.02.0+beta1.s	2014-08-04 11:15:19.571638074 +0100
+++ iobuf2.s/ocaml_4.02.0+beta1.s	2014-08-04 11:15:14.667637075 +0100
@@ -70,12 +70,10 @@
	call  camlBigarray__create_1118@PLT
.L102:
 	movq	camlTest@GOTPCREL(%rip), %rbx   ; A
 	movq	%rax, 8(%rbx)       ; storing the result of BA_create
 	movq	8(%rbx), %rax       ; B
-	movq	8(%rax), %rax       ; C
-	movq	$123, (%rax)        ; store
-	movq	8(%rbx), %rax       ; B
-	movq	8(%rax), %rax       ; C
-	movq	$456, 8(%rax)       ; store
-	movq	8(%rbx), %rax       ; B
-	movq	8(%rax), %rax       ; C
-	movq	$789, 16(%rax)      ; store
+	movq	8(%rax), %rbx       ; C
+	movq	$123, (%rbx)        ; store
+	movq	8(%rax), %rbx       ; C
+	movq	$456, 8(%rbx)       ; store
+	movq	8(%rax), %rax       ; C
+	movq	$789, 16(%rax)      ; store
 	movq	$1, %rax

I don't know how feasible it is to eliminate the duplicate "C" loads too.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions