Skip to content

Optimize big int powers of 2#115

Merged
sim642 merged 1 commit intodevelopfrom
optimize-pow2
Sep 16, 2022
Merged

Optimize big int powers of 2#115
sim642 merged 1 commit intodevelopfrom
optimize-pow2

Conversation

@sim642
Copy link
Copy Markdown
Member

@sim642 sim642 commented Sep 16, 2022

While profiling I noticed Cil.truncateCilint taking unusually much time computing these powers. In most cases this probably makes no difference but in the case where I noticed this, it was ~10% of the time. The truncation function is indirectly involved in a number of CIL things, including constant folding.

Microbenchmarks confirm that using left shifts to compute these powers is 5 to 182 times faster, depending on the shift amount:

Details
*** Run benchmarks for path "pow2.16"

Throughputs for "pow", "lsl" each running for at least 1 CPU second:
pow:  1.06 WALL ( 1.06 usr +  0.00 sys =  1.06 CPU) @ 10647729.73/s (n=11324180)
lsl:  1.06 WALL ( 1.06 usr +  0.00 sys =  1.06 CPU) @ 1000373484.85/s (n=1057141679)
            Rate   pow   lsl
pow   10647730/s    --  -99%
lsl 1000373485/s 9295%    --
**********************************************************************
*** Run benchmarks for path "pow2.32"

Throughputs for "pow", "lsl" each running for at least 1 CPU second:
pow:  1.05 WALL ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 10584764.32/s (n=11103259)
lsl:  1.08 WALL ( 1.08 usr +  0.00 sys =  1.08 CPU) @ 1944034412.17/s (n=2095989862)
            Rate    pow    lsl
pow   10584764/s     --   -99%
lsl 1944034412/s 18266%     --
**********************************************************************
*** Run benchmarks for path "pow2.64"

Throughputs for "pow", "lsl" each running for at least 1 CPU second:
pow:  1.06 WALL ( 1.06 usr +  0.00 sys =  1.06 CPU) @ 10889459.96/s (n=11519437)
lsl:  1.04 WALL ( 1.03 usr +  0.00 sys =  1.03 CPU) @ 74028830.75/s (n=76530413)
          Rate  pow  lsl
pow 10889460/s   -- -85%
lsl 74028831/s 580%   --
**********************************************************************
*** Run benchmarks for path "pow2.8"

Throughputs for "pow", "lsl" each running for at least 1 CPU second:
pow:  1.07 WALL ( 1.06 usr +  0.00 sys =  1.06 CPU) @ 10730495.05/s (n=11409671)
lsl:  1.15 WALL ( 1.15 usr +  0.00 sys =  1.15 CPU) @ 1461986354.64/s (n=1684910034)
            Rate    pow    lsl
pow   10730495/s     --   -99%
lsl 1461986355/s 13525%     --

@michael-schwarz
Copy link
Copy Markdown
Member

Interesting, wouldn't have expected this to ever matter relative to the runtime of all the rest of the code!

@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Sep 16, 2022

I saw this on a small SV-COMP program with a large number of witness invariants to unassume.
Another unexpected place where it turned up was Goblint's CilType.Exp.compare (and equal) that get used a bunch in sets, maps and hashtables. The reason being that it still calls Expcompare.compareExp which does constant folding, every time any of those data structure operations happens with expressions...

The latter is also avoided in goblint/analyzer#827, so in the grand scheme of things, this might become insignificant.

@sim642 sim642 merged commit 943862f into develop Sep 16, 2022
@sim642 sim642 deleted the optimize-pow2 branch September 16, 2022 15:42
@sim642 sim642 added this to the 2.0.1 milestone Nov 21, 2022
sim642 added a commit to sim642/opam-repository that referenced this pull request Nov 23, 2022
CHANGES:

* Fix scope of enum definition in return type (goblint/cil#112, goblint/cil#113).
* Fix signed integer left shift constant folding overflow (goblint/cil#122, goblint/cil#123).
* Fix `fitsInInt` for booleans (goblint/cil#111).
* Mark more loop statement locations synthetic (goblint/cil#125).
* Optimize integer truncation (goblint/cil#115).
* Fix FrontC and Cabs2cil partial application (goblint/cil#116).
* Fix external usage of `freshLabel` (goblint/cil#121).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants