Goblint performance regression 4.14 vs 5.3

[As suggested](https://discuss.ocaml.org/t/ocaml-5-3-0-released/15916/4), I'm opening this issue to report that we've hesitated to switch the [Goblint static analyzer](https://github.com/goblint/analyzer) from OCaml 4.14 to 5.x due to performance regressions.

# Large-scale SV-COMP benchmark
Goblint is benchmarked on the [SV-COMP benchmark suite](https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks) which consists of ~31443 C programs to analyze. Below, I've benchmarked Goblint, built from the same sources, on two different versions of OCaml, with `./goblint --version` outputs
```
Goblint version: heads/master-0-gff0e24c46-dirty
Cil version:     2.0.5
Dune profile:    release
OCaml version:   4.14.2
OCaml flambda:   true
Build time:      2025-01-13T09:02:04
```
```
Goblint version: heads/master-0-gff0e24c46
Cil version:     2.0.5
Dune profile:    release
OCaml version:   5.3.0
OCaml flambda:   true
Build time:      2025-01-13T19:11:14
```

The benchmarks below are executed using [BenchExec](https://github.com/sosy-lab/benchexec) with each run of Goblint given 1 CPU core, 60s CPU time and 1 GB of memory. The following scatter plots summarize 27221 runs which did not error, time out, run out of memory, etc.
X-axis corresponds to Goblint on OCaml 4.14, Y-axis to OCaml 5.3. The blue line is identity (without regressions all dots should be close to it), the gray lines indicate 2× and 0.5×, and the green line is a linear regression.
There's clearly a lot of variance in these results, but the large benchmark suite suggests that the regression is somewhat systematic, not limited to a small number of instances.

## CPU time
![image](https://github.com/user-attachments/assets/c53c97e9-5512-42eb-82b0-8a489d23ca5a)
Linear regression line has formula: $$y=1.14x-0.02$$, so ~14% slowdown on average.

## Maximum memory use
![image](https://github.com/user-attachments/assets/c277200f-6faa-4b7d-b275-0d4bbd7f01f4)
Linear regression line has formula: $$y=1.23x-3.75$$, so ~23% maximum memory increase on average.


# Single instance hardware-verification-bv/btor2c-lazyMod.vcegar_QF_BV_itc99_b13_p02
Since the large-scale benchmarks above can be quite tedious to reproduce, and require lots of setup and resources, I'll mention one of the more extreme outliers which hopefully can be executed reasonably easily by building Goblint (e.g. at https://github.com/goblint/analyzer/commit/ff0e24c468f83ea95f20efbe49edf665ce1c8096 used above with optional `apron` dependency installed; opam release 2.5.0 probably behaves quite similarly) and getting the benchmark from [sv-benchmarks](https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks) (it suffices to download the two required files directly rather than clone the large repository).

## OCaml 4.14
```console
./goblint --conf conf/svcomp.json --sets ana.specification /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/properties/unreach-call.prp --sets exp.architecture 64bit /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.vcegar_QF_BV_itc99_b13_p02.c -v
```
eventually outputs something like
```
[..]
SV-COMP result: true
[Info][Witness] witness generation summary:
  location invariants: 0
  loop invariants: 29
  flow-insensitive invariants: 0
  total generation entries: 1

[Info] vars = 436    evals = 3821    narrow_reuses = 1

[Info] Timings:
                                  cputime   walltime   allocated   count
Default                            2.541s     2.546s   4842.53MB      1×
[..]
```

## OCaml 5.3
```console
./goblint --conf conf/svcomp.json --sets ana.specification /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/properties/unreach-call.prp --sets exp.architecture 64bit /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.vcegar_QF_BV_itc99_b13_p02.c -v
```
eventually outputs something like
```
[..]
SV-COMP result: true
[Info][Witness] witness generation summary:
  location invariants: 0
  loop invariants: 29
  flow-insensitive invariants: 0
  total generation entries: 1

[Info] vars = 436    evals = 3821    narrow_reuses = 1

[Info] Timings:
                                  cputime   walltime   allocated   count
Default                           10.277s    10.283s  20526.93MB      1×
[..]
```

## Difference
For this benchmark, both the CPU time and the allocated memory increase ~4×. Note that this `allocated` field in Goblint output is measured using `Gc.allocated_bytes` and is different from the maximum memory usage measured by BenchExec above.

I don't know whether the values of `Gc.allocated_bytes` from OCaml 4.14 and 5.3 are even comparable, but nevertheless it manifests a significant difference in CPU time. If they are comparable, then I'm a bit surprised that the same code would allocate so much more all of a sudden.

I don't know if this single instance is representative of the systematic regression, but at least it's a quick-running case with significant change, which might make it easier to investigate.


# Conclusion
I've been aware of #11733 for a while which is regarding regressions for Frama-C. I haven't checked/tried any ideas from there, but the two cases might have common features: both Frama-C's Eva and Goblint are abstract interpreters for C programs and even use their own forks of the CIL frontend.

The performance regressions already existed around OCaml 5.0 release time (or whenever all of Goblint's dependencies became 5.0-compatible). Since then, some regressions seem to have improved (maybe related to weak hashtables used for hashconsing) but clearly not everything.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goblint performance regression 4.14 vs 5.3 #13733

Large-scale SV-COMP benchmark

CPU time

Maximum memory use

Single instance hardware-verification-bv/btor2c-lazyMod.vcegar_QF_BV_itc99_b13_p02

OCaml 4.14

OCaml 5.3

Difference

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Goblint performance regression 4.14 vs 5.3 #13733

Description

Large-scale SV-COMP benchmark

CPU time

Maximum memory use

Single instance hardware-verification-bv/btor2c-lazyMod.vcegar_QF_BV_itc99_b13_p02

OCaml 4.14

OCaml 5.3

Difference

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions