-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Bytecode compiler emits too many calls to caml_ensure_stack_capacity, causing a slowdown #11062
Copy link
Copy link
Closed
Milestone
Description
The bytecode compiler emits calls to caml_ensure_stack_capacity to reallocate the stack if the free space falls below Config.stack_threshold. In #510, a safety margin was added due to the impossibility to compute the exact space needed in some cases.
The problem is that Config.stack_threshold has been reduced from 256 words to 16, and as a consequence the check at
Line 1088 in f022f9b
| if used_safe > Config.stack_threshold then |
caml_ensure_stack_capacity.
This was discovered by @jonludlam.
It explains part of the performance regressions observed in preliminary bytecode benchmarks by @shakthimaan, the biggest being a 1.4x slowdown on the knucleotide benchmark.
See below the perf profile on 4.13.1:
Samples: 2M of event 'cycles:u', Event count (approx.): 801289990777
Overhead Command Shared Object Symbol
+ 79,02% knucleotide.bc ocamlrun [.] caml_interprete
+ 6,48% knucleotide.bc ocamlrun [.] caml_array_get_addr
+ 3,06% knucleotide.bc ocamlrun [.] caml_modify
+ 2,59% knucleotide.bc ocamlrun [.] caml_string_equal
+ 2,01% knucleotide.bc ocamlrun [.] caml_string_length
+ 1,97% knucleotide.bc ocamlrun [.] caml_bytes_get
+ 1,80% knucleotide.bc ocamlrun [.] caml_string_get
+ 0,80% knucleotide.bc libc-2.33.so [.] __memmove_avx_unaligned_erms
+ 0,79% knucleotide.bc ocamlrun [.] caml_ml_bytes_length
0,45% knucleotide.bc ocamlrun [.] caml_blit_bytes
0,36% knucleotide.bc ocamlrun [.] caml_ml_string_length
0,30% knucleotide.bc ocamlrun [.] caml_bytes_equal
0,13% knucleotide.bc ocamlrun [.] memmove@plt
0,08% knucleotide.bc ocamlrun [.] caml_input_scan_line
0,04% knucleotide.bc [unknown] [k] 0xffffffffaa400cf0
0,02% knucleotide.bc ocamlrun [.] caml_alloc_string
0,01% knucleotide.bc ocamlrun [.] caml_ml_input_scan_line
0,01% knucleotide.bc ocamlrun [.] caml_ml_input
0,01% knucleotide.bc ocamlrun [.] caml_ml_input_char
0,01% knucleotide.bc [unknown] [k] 0xffffffffaa400ac0
0,01% knucleotide.bc ocamlrun [.] caml_string_of_bytes
0,01% knucleotide.bc ocamlrun [.] caml_empty_minor_heap
0,00% knucleotide.bc ocamlrun [.] mark_slice_darken.constprop.0
0,00% knucleotide.bc ocamlrun [.] caml_oldify_one
0,00% knucleotide.bc ocamlrun [.] caml_create_bytes
And on f022f9b:
Samples: 3M of event 'cycles:u', Event count (approx.): 909199923827
Overhead Command Shared Object Symbol
+ 73,92% Domain0 ocamlrun [.] caml_interprete
+ 6,50% Domain0 ocamlrun [.] caml_ensure_stack_capacity
+ 5,45% Domain0 ocamlrun [.] caml_array_get_addr
+ 3,37% Domain0 ocamlrun [.] caml_modify
+ 2,65% Domain0 ocamlrun [.] caml_bytes_get
+ 1,94% Domain0 ocamlrun [.] caml_string_equal
+ 1,66% Domain0 ocamlrun [.] caml_string_length
+ 1,54% Domain0 ocamlrun [.] caml_string_get
+ 0,81% Domain0 ocamlrun [.] caml_ml_bytes_length
+ 0,63% Domain0 libc-2.33.so [.] __memmove_evex_unaligned_erms
0,47% Domain0 ocamlrun [.] caml_blit_bytes
0,35% Domain0 ocamlrun [.] caml_ml_string_length
0,29% Domain0 ocamlrun [.] caml_bytes_equal
0,11% Domain0 ocamlrun [.] memmove@plt
0,06% Domain0 ocamlrun [.] caml_input_scan_line
0,04% Domain0 libpthread-2.33.so [.] __pthread_mutex_trylock
0,04% Domain0 [unknown] [k] 0xffffffffaa400cf0
0,04% Domain0 libpthread-2.33.so [.] __pthread_mutex_unlock_usercnt
0,02% Domain0 ocamlrun [.] caml_alloc_string
0,01% Domain0 ocamlrun [.] caml_ml_input
0,01% Domain0 ocamlrun [.] caml_ml_input_scan_line
0,01% Domain0 ocamlrun [.] caml_ml_input_char
0,01% Domain0 [unknown] [k] 0xffffffffaa400ac0
0,01% Domain0 ocamlrun [.] caml_check_pending_actions
0,01% Domain0 ocamlrun [.] caml_empty_minor_heap_promote
0,00% Domain0 ocamlrun [.] channel_mutex_lock_default
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels