Skip to content

perf(overflow): Speed up overflow checks for small integers#303

Merged
zeroshade merged 2 commits intoapache:mainfrom
cbandy:fast-overflow
Mar 10, 2025
Merged

perf(overflow): Speed up overflow checks for small integers#303
zeroshade merged 2 commits intoapache:mainfrom
cbandy:fast-overflow

Conversation

@cbandy
Copy link
Copy Markdown
Contributor

@cbandy cbandy commented Mar 6, 2025

Rationale for this change

We can check the bounds of the arguments to avoid an int64 division. The Mul and Mul64 functions perform better on the systems I tested.

This is an older machine:

goos: linux
goarch: amd64
pkg: github.com/apache/arrow-go/v18/internal/utils
cpu: Intel(R) Core(TM) i5-7Y57 CPU @ 1.20GHz
                                  │ bench_amd64_old.txt │         bench_amd64_new.txt         │
                                  │       sec/op        │    sec/op     vs base               │
Add/int/8192_+_8192-4                      1.990n ± 14%   2.001n ± 18%        ~ (p=1.000 n=6)
Add/int/MaxInt16_+_1-4                     2.248n ±  1%   2.271n ±  4%   +1.07% (p=0.035 n=6)
Add/int/MaxInt16_+_5-4                     2.215n ±  2%   2.270n ±  3%   +2.48% (p=0.013 n=6)
Add/int/MaxInt16_+_MaxInt16-4              2.293n ±  4%   2.331n ±  9%        ~ (p=0.195 n=6)
Add/int/MaxInt32_+_1-4                     2.339n ± 16%   2.345n ± 67%        ~ (p=0.937 n=6)
Add/int/MaxInt32_+_5-4                     2.363n ±  4%   2.330n ±  2%        ~ (p=0.331 n=6)
Add/int/MaxInt32_+_MaxInt32-4              2.402n ± 23%   2.343n ±  3%        ~ (p=0.310 n=6)
Add/int/MaxInt_+_MaxInt-4                  2.334n ±  3%   2.364n ±  2%        ~ (p=0.180 n=6)
Mul/int/8192_×_8192-4                     14.335n ±  3%   3.578n ±  5%  -75.04% (p=0.002 n=6)
Mul/int/MaxInt16_×_1-4                    14.015n ± 90%   3.718n ± 20%  -73.47% (p=0.002 n=6)
Mul/int/MaxInt16_×_5-4                    14.480n ± 25%   3.647n ±  3%  -74.82% (p=0.002 n=6)
Mul/int/MaxInt16_×_MaxInt16-4             14.450n ± 30%   3.627n ± 35%  -74.90% (p=0.002 n=6)
Mul/int/MaxInt32_×_1-4                    14.725n ±  2%   3.597n ± 12%  -75.57% (p=0.002 n=6)
Mul/int/MaxInt32_×_5-4                    17.485n ± 31%   3.667n ±  5%  -79.03% (p=0.002 n=6)
Mul/int/MaxInt32_×_MaxInt32-4             14.385n ± 31%   5.704n ± 48%  -60.35% (p=0.002 n=6)
Mul/int/MaxInt_×_MaxInt-4                  14.34n ±  3%   15.68n ±  4%   +9.34% (p=0.002 n=6)
Mul64/int64/8192_×_8192-4                 14.495n ±  6%   3.730n ±  4%  -74.27% (p=0.002 n=6)
Mul64/int64/MaxInt16_×_1-4                14.300n ±  3%   3.612n ±  1%  -74.74% (p=0.002 n=6)
Mul64/int64/MaxInt16_×_5-4                14.020n ± 20%   3.657n ±  3%  -73.91% (p=0.002 n=6)
Mul64/int64/MaxInt16_×_MaxInt16-4         14.110n ±  2%   3.611n ± 12%  -74.41% (p=0.002 n=6)
Mul64/int64/MaxInt32_×_1-4                14.330n ±  2%   3.609n ±  4%  -74.82% (p=0.002 n=6)
Mul64/int64/MaxInt32_×_5-4                14.250n ± 14%   3.670n ±  2%  -74.25% (p=0.002 n=6)
Mul64/int64/MaxInt32_×_MaxInt32-4         14.070n ±  2%   3.571n ±  3%  -74.62% (p=0.002 n=6)
Mul64/int64/MaxInt_×_MaxInt-4              14.17n ±  2%   15.77n ±  3%  +11.25% (p=0.002 n=6)
geomean                                    7.807n         3.583n        -54.10%

The following is inside a docker.io/i386/ubuntu container on that same machine. I don't have any 32-bit hardware.

goos: linux
goarch: 386
pkg: github.com/apache/arrow-go/v18/internal/utils
cpu: Intel(R) Core(TM) i5-7Y57 CPU @ 1.20GHz
                                                        │ bench_386_old.txt │          bench_386_new.txt          │
                                                        │      sec/op       │    sec/op     vs base               │
Add/int/8192_+_8192-4                                          3.113n ± 22%   2.848n ± 17%        ~ (p=0.180 n=6)
Add/int/MaxInt16_+_1-4                                         3.344n ±  4%   3.294n ± 18%        ~ (p=0.818 n=6)
Add/int/MaxInt16_+_5-4                                         3.420n ±  7%   3.411n ±  5%        ~ (p=0.937 n=6)
Add/int/MaxInt16_+_MaxInt16-4                                  3.360n ±  3%   3.402n ± 30%        ~ (p=0.240 n=6)
Add/int/MaxInt_+_1-4                                           3.329n ±  2%   3.433n ±  3%   +3.09% (p=0.039 n=6)
Add/int/MaxInt_+_5-4                                           3.452n ±  5%   3.436n ±  2%        ~ (p=0.937 n=6)
Add/int/MaxInt_+_MaxInt-4                                      3.353n ±  6%   3.454n ±  3%        ~ (p=0.132 n=6)
Add/int/MaxInt_+_MaxInt#01-4                                   3.346n ± 13%   3.488n ±  3%        ~ (p=0.132 n=6)
Mul/int/8192_×_8192-4                                         11.250n ±  2%   6.238n ±  6%  -44.55% (p=0.002 n=6)
Mul/int/MaxInt16_×_1-4                                        11.565n ±  4%   6.261n ±  1%  -45.86% (p=0.002 n=6)
Mul/int/MaxInt16_×_5-4                                        11.300n ±  3%   6.505n ±  3%  -42.44% (p=0.002 n=6)
Mul/int/MaxInt16_×_MaxInt16-4                                 11.370n ±  4%   6.394n ± 40%  -43.76% (p=0.002 n=6)
Mul/int/MaxInt_×_1-4                                          11.245n ±  3%   5.798n ±  2%  -48.44% (p=0.002 n=6)
Mul/int/MaxInt_×_5-4                                          11.210n ± 18%   9.941n ±  2%  -11.32% (p=0.002 n=6)
Mul/int/MaxInt_×_MaxInt-4                                     10.975n ±  4%   9.895n ±  2%   -9.84% (p=0.002 n=6)
Mul/int/MaxInt_×_MaxInt#01-4                                   11.40n ± 18%   10.17n ±  2%  -10.83% (p=0.002 n=6)
Mul64/int64/8192_×_8192-4                                      21.74n ± 70%   23.75n ±  2%        ~ (p=0.065 n=6)
Mul64/int64/MaxInt16_×_1-4                                     22.37n ± 28%   23.87n ± 12%        ~ (p=0.065 n=6)
Mul64/int64/MaxInt16_×_5-4                                     22.09n ±  2%   23.96n ± 23%   +8.44% (p=0.002 n=6)
Mul64/int64/MaxInt16_×_MaxInt16-4                              21.06n ±  3%   24.27n ±  3%  +15.24% (p=0.002 n=6)
Mul64/int64/MaxInt_×_1-4                                       21.42n ±  8%   23.91n ± 64%  +11.62% (p=0.002 n=6)
Mul64/int64/MaxInt_×_5-4                                       29.87n ±  4%   24.16n ± 22%  -19.09% (p=0.009 n=6)
Mul64/int64/MaxInt_×_MaxInt-4                                  28.10n ±  2%   24.40n ±  2%  -13.17% (p=0.002 n=6)
Mul64/int64/9223372036854775807_×_9223372036854775807-4        23.22n ±  6%   31.90n ± 35%  +37.38% (p=0.002 n=6)
geomean                                                        9.609n         8.523n        -11.30%

What changes are included in this PR?

A new generic Add function in internal/utils returns the sum of any two integers and whether or not it overflowed.

New Mul and Mul64 functions in internal/utils return the product of two int or int64, respectively, and whether or not it overflowed. These functions perform better on the systems I tested.

Are these changes tested?

Yes.

Are there any user-facing changes?

No effect on the API. This drops one dependency.

@cbandy cbandy requested a review from zeroshade as a code owner March 6, 2025 01:29
@cbandy
Copy link
Copy Markdown
Contributor Author

cbandy commented Mar 6, 2025

Sorry, I missed some hunks in my first commit. All fixed.

@cbandy
Copy link
Copy Markdown
Contributor Author

cbandy commented Mar 7, 2025

It occurred to me today that this is better as two commits. The benchmarks are now committed, and one can try both implementations by switching between commits.

  1. The first commit adds the new functions, their tests, and benchmarks for the current implementation.
  2. The second commit uses the new functions throughout.

cbandy added 2 commits March 6, 2025 20:46
We can check the bounds of the arguments to avoid an int64 division.

This adds benchmarks for the existing implementation, new functions,
and tests, but does not change any behavior.

Signed-off-by: Chris Bandy <bandy.chris@gmail.com>
These have proved faster in the aggregate and allow us to drop one
dependency.

Signed-off-by: Chris Bandy <bandy.chris@gmail.com>
@zeroshade
Copy link
Copy Markdown
Member

Thanks for this @cbandy! I'll take a closer look at it this afternoon!

Copy link
Copy Markdown
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @cbandy I'm seeing similar performance benefits when i test it myself locally. though I am curious about the small number of cases where it is slower. But I also want this in so I can start a release. The small number of cases that slow down don't seem to be worthwhile enough to be an issue so I'm going to merge this.

It would be awesome if you were able to figure out if those cases that were slower were just noise or indicative of something that might be an issue.

Thanks again for this!

@zeroshade zeroshade merged commit 2c14a75 into apache:main Mar 10, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants