Add basic pure benchmarks by curiousleo · Pull Request #110 · idontgetoutmuch/random

curiousleo · 2020-04-22T15:20:20Z

Mostly stolen from https://github.com/lehins/haskell-benchmarks/tree/new-random/new-random-benchmarks.

Current results

All measurements here are for 1048576 iterations. I changed the number of iterations to 1000000 in a later commit for convenience, but the difference is within the margin of error in most cases, so I didn't update the times listed here in the description.

$ stack bench random:bench --ba '--small'
[...]
Benchmark bench: RUNNING...
baseline/nextWord32                      mean 277.8 μs  ( +- 12.21 μs  )
baseline/nextWord64                      mean 277.3 μs  ( +- 7.820 μs  )
baseline/nextInt                         mean 276.6 μs  ( +- 10.38 μs  )
baseline/split                           mean 7.097 ms  ( +- 60.36 μs  )
pure/random/Float                        mean 4.093 ms  ( +- 82.97 μs  )
pure/random/Double                       mean 5.275 ms  ( +- 112.8 μs  )
pure/random/Integer                      mean 4.287 ms  ( +- 155.3 μs  )
pure/uniform/Word8                       mean 293.3 μs  ( +- 9.312 μs  )
pure/uniform/Word16                      mean 293.6 μs  ( +- 25.63 μs  )
pure/uniform/Word32                      mean 297.9 μs  ( +- 32.12 μs  )
pure/uniform/Word64                      mean 333.0 μs  ( +- 51.32 μs  )
pure/uniform/Word                        mean 310.5 μs  ( +- 32.72 μs  )
pure/uniform/Int8                        mean 289.4 μs  ( +- 7.407 μs  )
pure/uniform/Int16                       mean 287.8 μs  ( +- 8.539 μs  )
pure/uniform/Int32                       mean 288.0 μs  ( +- 12.18 μs  )
pure/uniform/Int64                       mean 286.9 μs  ( +- 8.415 μs  )
pure/uniform/Int                         mean 284.6 μs  ( +- 6.549 μs  )
pure/uniform/Char                        mean 11.56 ms  ( +- 111.0 μs  )
pure/uniform/Bool                        mean 281.3 μs  ( +- 4.134 μs  )
pure/uniform/CBool                       mean 286.4 μs  ( +- 10.23 μs  )
pure/uniform/CChar                       mean 279.2 μs  ( +- 3.491 μs  )
pure/uniform/CSChar                      mean 278.6 μs  ( +- 3.916 μs  )
pure/uniform/CUChar                      mean 278.3 μs  ( +- 3.818 μs  )
pure/uniform/CShort                      mean 279.8 μs  ( +- 5.486 μs  )
pure/uniform/CUShort                     mean 294.5 μs  ( +- 33.58 μs  )
pure/uniform/CInt                        mean 279.2 μs  ( +- 4.146 μs  )
pure/uniform/CUInt                       mean 288.5 μs  ( +- 10.21 μs  )
pure/uniform/CLong                       mean 286.3 μs  ( +- 8.502 μs  )
pure/uniform/CULong                      mean 301.9 μs  ( +- 34.07 μs  )
pure/uniform/CPtrdiff                    mean 284.5 μs  ( +- 8.010 μs  )
pure/uniform/CSize                       mean 279.8 μs  ( +- 4.210 μs  )
pure/uniform/CWchar                      mean 287.0 μs  ( +- 10.65 μs  )
pure/uniform/CSigAtomic                  mean 285.8 μs  ( +- 7.762 μs  )
pure/uniform/CLLong                      mean 294.4 μs  ( +- 12.35 μs  )
pure/uniform/CULLong                     mean 285.9 μs  ( +- 7.633 μs  )
pure/uniform/CIntPtr                     mean 292.7 μs  ( +- 12.71 μs  )
pure/uniform/CUIntPtr                    mean 292.6 μs  ( +- 9.984 μs  )
pure/uniform/CIntMax                     mean 284.5 μs  ( +- 11.39 μs  )
pure/uniform/CUIntMax                    mean 285.1 μs  ( +- 8.449 μs  )
Benchmark bench: FINISH

Optimisation

Note that Char is an outlier. It uses rejection sampling under the hood because not every Word32 is a valid Char. However, changing the method used from unsignedBitmaskWithRejectionM to unbiasedWordMult32 (as done in the second commit of this PR) approximately halves the time required to generate a Char:

Before

$ stack bench random:bench --ba 'pure/uniform/Char'
[...]
benchmarked pure/uniform/Char
time                 11.56 ms   (11.49 ms .. 11.62 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 11.44 ms   (11.39 ms .. 11.47 ms)
std dev              111.0 μs   (74.00 μs .. 177.8 μs)

After

$ stack bench random:bench --ba 'pure/uniform/Char'
[...]
benchmarked pure/uniform/Char
time                 6.238 ms   (6.205 ms .. 6.275 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.457 ms   (6.424 ms .. 6.494 ms)
std dev              105.3 μs   (90.97 μs .. 123.8 μs)

idontgetoutmuch · 2020-04-22T16:03:55Z

I'll put these in the changelog rather than the old school benchmarks.

curiousleo · 2020-04-22T16:19:24Z

@idontgetoutmuch wrote:

I'll put these in the changelog rather than the old school benchmarks.

Fantastic!

I've backported the benchmarks in their current state to master here: #111.

Those results are valid to compare to the ones posted here - same machine, same benchmarking code, etc. I had to benchmark with --quick because they took so long ... this means the accuracy is not as good, but this is about orders of magnitude anyway.

curiousleo · 2020-04-23T06:57:34Z

I didn't say it in the description originally, but I just want to make it clear that the times in the description are for 1048576 iterations in each case. I just stole that number from @lehins' benchmarks. I'm thinking of changing it to 1000000 just to make it a little easier to deduce the single iteration runtime.

Edit: changed to 1000000.

lehins · 2020-04-27T18:45:34Z

@curiousleo 1048576 was no coincidence ;) I don't particularly care, it's just benchmarks, but just to give you a reason why 2^20=1048576 chosen. Thanks to this number you can easily translate from:

pure/uniform/Word8                       mean 293.3 μs
pure/uniform/Word16                      mean 293.6 μs

to it took 293.3 μs to generate 1MiB of data of Word8 type and it took 293.6 μs for 2MiB of data of Word16. I don't think anyone actually thinks in terms of a time it took to generate a single value, since that number is so low in these benchmarks.

curiousleo · 2020-04-28T06:57:44Z

@lehins ah cool, thanks for the explanation.

I don't think anyone actually thinks in terms of a time it took to generate a single value, since that number is so low in these benchmarks.

It is now :)

Good point though. I've actually reduces the number of iterations further from 1_000_000 to 100_000 in a follow-up PR so we can easily compare uniformR results as well, which takes longer.

It's not easy to communicate what these numbers mean. For Word8 etc. it makes sense to talk about "generated data". But does it make sense to talk about "4MiB of Int32s in some range"?

Anyway, I'm just thinking aloud here. Thanks for your explanation.

idontgetoutmuch · 2020-04-28T13:00:59Z

I was about to put the new benchmarks in the CHANGELOG but we have regressions

pure/uniformR/full/Word16  0.017675  0.000026  67,528%
pure/uniformR/full/Int16  0.019081  0.030798     -38%

There really should not be a regression though?

curiousleo · 2020-04-28T13:04:54Z

I was about to put the new benchmarks in the CHANGELOG but we have regressions
pure/uniformR/full/Word16  0.017675  0.000026  67,528%
pure/uniformR/full/Int16  0.019081  0.030798     -38%
There really should not be a regression though?

#103 fixes Int16 but introduces other regressions.

Add basic pure benchmarks

Add basic pure benchmarks

9cb76bc

curiousleo requested review from idontgetoutmuch and lehins April 22, 2020 15:20

Optimise Uniform Char instance

09f6f15

idontgetoutmuch approved these changes Apr 22, 2020

View reviewed changes

curiousleo mentioned this pull request Apr 22, 2020

Do not merge: Benchmarks on v1.1 #111

Draft

Change #iterations to 1000000

915f361

curiousleo merged commit 37d0d4a into interface-to-performance Apr 23, 2020

curiousleo deleted the steal-benchmarks branch April 23, 2020 07:11

curiousleo mentioned this pull request Apr 23, 2020

Add uniformR benchmarks #112

Merged

curiousleo added a commit that referenced this pull request May 19, 2020

Merge pull request #110 from idontgetoutmuch/steal-benchmarks

8cdc140

Add basic pure benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic pure benchmarks#110

Add basic pure benchmarks#110
curiousleo merged 3 commits intointerface-to-performancefrom
steal-benchmarks

curiousleo commented Apr 22, 2020 •

edited

Loading

Uh oh!

idontgetoutmuch commented Apr 22, 2020

Uh oh!

curiousleo commented Apr 22, 2020

Uh oh!

curiousleo commented Apr 23, 2020 •

edited

Loading

Uh oh!

lehins commented Apr 27, 2020

Uh oh!

curiousleo commented Apr 28, 2020

Uh oh!

idontgetoutmuch commented Apr 28, 2020

Uh oh!

curiousleo commented Apr 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

curiousleo commented Apr 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current results

Optimisation

Uh oh!

idontgetoutmuch commented Apr 22, 2020

Uh oh!

curiousleo commented Apr 22, 2020

Uh oh!

curiousleo commented Apr 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lehins commented Apr 27, 2020

Uh oh!

curiousleo commented Apr 28, 2020

Uh oh!

idontgetoutmuch commented Apr 28, 2020

Uh oh!

curiousleo commented Apr 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

curiousleo commented Apr 22, 2020 •

edited

Loading

curiousleo commented Apr 23, 2020 •

edited

Loading