Skip to content

Bytecode compiler emits too many calls to caml_ensure_stack_capacity, causing a slowdown #11062

@OlivierNicole

Description

@OlivierNicole

The bytecode compiler emits calls to caml_ensure_stack_capacity to reallocate the stack if the free space falls below Config.stack_threshold. In #510, a safety margin was added due to the impossibility to compute the exact space needed in some cases.

The problem is that Config.stack_threshold has been reduced from 256 words to 16, and as a consequence the check at

if used_safe > Config.stack_threshold then
currently always fails and we always emit the call to caml_ensure_stack_capacity.

This was discovered by @jonludlam.

It explains part of the performance regressions observed in preliminary bytecode benchmarks by @shakthimaan, the biggest being a 1.4x slowdown on the knucleotide benchmark.

See below the perf profile on 4.13.1:

Samples: 2M of event 'cycles:u', Event count (approx.): 801289990777
  Overhead  Command         Shared Object       Symbol
+   79,02%  knucleotide.bc  ocamlrun            [.] caml_interprete
+    6,48%  knucleotide.bc  ocamlrun            [.] caml_array_get_addr
+    3,06%  knucleotide.bc  ocamlrun            [.] caml_modify
+    2,59%  knucleotide.bc  ocamlrun            [.] caml_string_equal
+    2,01%  knucleotide.bc  ocamlrun            [.] caml_string_length
+    1,97%  knucleotide.bc  ocamlrun            [.] caml_bytes_get
+    1,80%  knucleotide.bc  ocamlrun            [.] caml_string_get
+    0,80%  knucleotide.bc  libc-2.33.so        [.] __memmove_avx_unaligned_erms
+    0,79%  knucleotide.bc  ocamlrun            [.] caml_ml_bytes_length
     0,45%  knucleotide.bc  ocamlrun            [.] caml_blit_bytes
     0,36%  knucleotide.bc  ocamlrun            [.] caml_ml_string_length
     0,30%  knucleotide.bc  ocamlrun            [.] caml_bytes_equal
     0,13%  knucleotide.bc  ocamlrun            [.] memmove@plt
     0,08%  knucleotide.bc  ocamlrun            [.] caml_input_scan_line
     0,04%  knucleotide.bc  [unknown]           [k] 0xffffffffaa400cf0
     0,02%  knucleotide.bc  ocamlrun            [.] caml_alloc_string
     0,01%  knucleotide.bc  ocamlrun            [.] caml_ml_input_scan_line
     0,01%  knucleotide.bc  ocamlrun            [.] caml_ml_input
     0,01%  knucleotide.bc  ocamlrun            [.] caml_ml_input_char
     0,01%  knucleotide.bc  [unknown]           [k] 0xffffffffaa400ac0
     0,01%  knucleotide.bc  ocamlrun            [.] caml_string_of_bytes
     0,01%  knucleotide.bc  ocamlrun            [.] caml_empty_minor_heap
     0,00%  knucleotide.bc  ocamlrun            [.] mark_slice_darken.constprop.0
     0,00%  knucleotide.bc  ocamlrun            [.] caml_oldify_one
     0,00%  knucleotide.bc  ocamlrun            [.] caml_create_bytes

And on f022f9b:

Samples: 3M of event 'cycles:u', Event count (approx.): 909199923827
  Overhead  Command  Shared Object       Symbol
+   73,92%  Domain0  ocamlrun            [.] caml_interprete
+    6,50%  Domain0  ocamlrun            [.] caml_ensure_stack_capacity
+    5,45%  Domain0  ocamlrun            [.] caml_array_get_addr
+    3,37%  Domain0  ocamlrun            [.] caml_modify
+    2,65%  Domain0  ocamlrun            [.] caml_bytes_get
+    1,94%  Domain0  ocamlrun            [.] caml_string_equal
+    1,66%  Domain0  ocamlrun            [.] caml_string_length
+    1,54%  Domain0  ocamlrun            [.] caml_string_get
+    0,81%  Domain0  ocamlrun            [.] caml_ml_bytes_length
+    0,63%  Domain0  libc-2.33.so        [.] __memmove_evex_unaligned_erms
     0,47%  Domain0  ocamlrun            [.] caml_blit_bytes
     0,35%  Domain0  ocamlrun            [.] caml_ml_string_length
     0,29%  Domain0  ocamlrun            [.] caml_bytes_equal
     0,11%  Domain0  ocamlrun            [.] memmove@plt
     0,06%  Domain0  ocamlrun            [.] caml_input_scan_line
     0,04%  Domain0  libpthread-2.33.so  [.] __pthread_mutex_trylock
     0,04%  Domain0  [unknown]           [k] 0xffffffffaa400cf0
     0,04%  Domain0  libpthread-2.33.so  [.] __pthread_mutex_unlock_usercnt
     0,02%  Domain0  ocamlrun            [.] caml_alloc_string
     0,01%  Domain0  ocamlrun            [.] caml_ml_input
     0,01%  Domain0  ocamlrun            [.] caml_ml_input_scan_line
     0,01%  Domain0  ocamlrun            [.] caml_ml_input_char
     0,01%  Domain0  [unknown]           [k] 0xffffffffaa400ac0
     0,01%  Domain0  ocamlrun            [.] caml_check_pending_actions
     0,01%  Domain0  ocamlrun            [.] caml_empty_minor_heap_promote
     0,00%  Domain0  ocamlrun            [.] channel_mutex_lock_default

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions