-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Goblint performance regression 4.14 vs 5.3 #13733
Description
As suggested, I'm opening this issue to report that we've hesitated to switch the Goblint static analyzer from OCaml 4.14 to 5.x due to performance regressions.
Large-scale SV-COMP benchmark
Goblint is benchmarked on the SV-COMP benchmark suite which consists of ~31443 C programs to analyze. Below, I've benchmarked Goblint, built from the same sources, on two different versions of OCaml, with ./goblint --version outputs
Goblint version: heads/master-0-gff0e24c46-dirty
Cil version: 2.0.5
Dune profile: release
OCaml version: 4.14.2
OCaml flambda: true
Build time: 2025-01-13T09:02:04
Goblint version: heads/master-0-gff0e24c46
Cil version: 2.0.5
Dune profile: release
OCaml version: 5.3.0
OCaml flambda: true
Build time: 2025-01-13T19:11:14
The benchmarks below are executed using BenchExec with each run of Goblint given 1 CPU core, 60s CPU time and 1 GB of memory. The following scatter plots summarize 27221 runs which did not error, time out, run out of memory, etc.
X-axis corresponds to Goblint on OCaml 4.14, Y-axis to OCaml 5.3. The blue line is identity (without regressions all dots should be close to it), the gray lines indicate 2× and 0.5×, and the green line is a linear regression.
There's clearly a lot of variance in these results, but the large benchmark suite suggests that the regression is somewhat systematic, not limited to a small number of instances.
CPU time

Linear regression line has formula:
Maximum memory use

Linear regression line has formula:
Single instance hardware-verification-bv/btor2c-lazyMod.vcegar_QF_BV_itc99_b13_p02
Since the large-scale benchmarks above can be quite tedious to reproduce, and require lots of setup and resources, I'll mention one of the more extreme outliers which hopefully can be executed reasonably easily by building Goblint (e.g. at goblint/analyzer@ff0e24c used above with optional apron dependency installed; opam release 2.5.0 probably behaves quite similarly) and getting the benchmark from sv-benchmarks (it suffices to download the two required files directly rather than clone the large repository).
OCaml 4.14
./goblint --conf conf/svcomp.json --sets ana.specification /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/properties/unreach-call.prp --sets exp.architecture 64bit /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.vcegar_QF_BV_itc99_b13_p02.c -veventually outputs something like
[..]
SV-COMP result: true
[Info][Witness] witness generation summary:
location invariants: 0
loop invariants: 29
flow-insensitive invariants: 0
total generation entries: 1
[Info] vars = 436 evals = 3821 narrow_reuses = 1
[Info] Timings:
cputime walltime allocated count
Default 2.541s 2.546s 4842.53MB 1×
[..]
OCaml 5.3
./goblint --conf conf/svcomp.json --sets ana.specification /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/properties/unreach-call.prp --sets exp.architecture 64bit /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.vcegar_QF_BV_itc99_b13_p02.c -veventually outputs something like
[..]
SV-COMP result: true
[Info][Witness] witness generation summary:
location invariants: 0
loop invariants: 29
flow-insensitive invariants: 0
total generation entries: 1
[Info] vars = 436 evals = 3821 narrow_reuses = 1
[Info] Timings:
cputime walltime allocated count
Default 10.277s 10.283s 20526.93MB 1×
[..]
Difference
For this benchmark, both the CPU time and the allocated memory increase ~4×. Note that this allocated field in Goblint output is measured using Gc.allocated_bytes and is different from the maximum memory usage measured by BenchExec above.
I don't know whether the values of Gc.allocated_bytes from OCaml 4.14 and 5.3 are even comparable, but nevertheless it manifests a significant difference in CPU time. If they are comparable, then I'm a bit surprised that the same code would allocate so much more all of a sudden.
I don't know if this single instance is representative of the systematic regression, but at least it's a quick-running case with significant change, which might make it easier to investigate.
Conclusion
I've been aware of #11733 for a while which is regarding regressions for Frama-C. I haven't checked/tried any ideas from there, but the two cases might have common features: both Frama-C's Eva and Goblint are abstract interpreters for C programs and even use their own forks of the CIL frontend.
The performance regressions already existed around OCaml 5.0 release time (or whenever all of Goblint's dependencies became 5.0-compatible). Since then, some regressions seem to have improved (maybe related to weak hashtables used for hashconsing) but clearly not everything.