With the native code compiler, compiling a collection of N mutually recursive toplevel functions requires nonlinear time in N.
This script (gen.sh) creates such a collection:
#!/bin/bash
N="$1"
echo "let rec main() = f0()"
for (( c=0; c<=$N; c++ ))
do
echo "and f$((c))() = f$((c+1))()"
done
echo "and f$((c))() = print_endline \"Hello\""
echo "let () = main()"
Running this script with various values of N produces the following kind of output:
$ ./gen.sh 1000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo
real 0m0.459s
user 0m0.393s
sys 0m0.065s
Hello
$ ./gen.sh 2000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo
real 0m1.280s
user 0m1.201s
sys 0m0.078s
Hello
$ ./gen.sh 5000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo
real 0m7.913s
user 0m7.770s
sys 0m0.139s
Hello
$ ./gen.sh 10000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo
real 0m35.636s
user 0m35.360s
sys 0m0.261s
Hello
$ ./gen.sh 20000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo
real 2m54.296s
user 2m53.569s
sys 0m0.632s
Hello
$ ./gen.sh 40000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo
real 11m22.849s
user 11m20.508s
sys 0m1.663s
Hello
Passing -linscan does not help; the observed times are the same.
The memory usage, in the last run, does not exceed 500MB.
The size of the generated .o file seems linear, which is good.
This issue is problematic because Menhir's code back-end generates this kind of code. A typical LR automaton for a real-world grammar can have thousands of states, and the generated code can contain thousands of mutually recursive functions.
I note that a similar example involving non-recursive functions does not exhibit this behavior: it seems to behave linearly, as desired.
With the native code compiler, compiling a collection of
Nmutually recursive toplevel functions requires nonlinear time inN.This script (
gen.sh) creates such a collection:Running this script with various values of
Nproduces the following kind of output:Passing
-linscandoes not help; the observed times are the same.The memory usage, in the last run, does not exceed 500MB.
The size of the generated
.ofile seems linear, which is good.This issue is problematic because Menhir's code back-end generates this kind of code. A typical LR automaton for a real-world grammar can have thousands of states, and the generated code can contain thousands of mutually recursive functions.
I note that a similar example involving non-recursive functions does not exhibit this behavior: it seems to behave linearly, as desired.