Skip to content

Add basic pure benchmarks#110

Merged
curiousleo merged 3 commits intointerface-to-performancefrom
steal-benchmarks
Apr 23, 2020
Merged

Add basic pure benchmarks#110
curiousleo merged 3 commits intointerface-to-performancefrom
steal-benchmarks

Conversation

@curiousleo
Copy link
Copy Markdown
Collaborator

@curiousleo curiousleo commented Apr 22, 2020

Mostly stolen from https://github.com/lehins/haskell-benchmarks/tree/new-random/new-random-benchmarks.

Current results

All measurements here are for 1048576 iterations. I changed the number of iterations to 1000000 in a later commit for convenience, but the difference is within the margin of error in most cases, so I didn't update the times listed here in the description.

$ stack bench random:bench --ba '--small'
[...]
Benchmark bench: RUNNING...
baseline/nextWord32                      mean 277.8 μs  ( +- 12.21 μs  )
baseline/nextWord64                      mean 277.3 μs  ( +- 7.820 μs  )
baseline/nextInt                         mean 276.6 μs  ( +- 10.38 μs  )
baseline/split                           mean 7.097 ms  ( +- 60.36 μs  )
pure/random/Float                        mean 4.093 ms  ( +- 82.97 μs  )
pure/random/Double                       mean 5.275 ms  ( +- 112.8 μs  )
pure/random/Integer                      mean 4.287 ms  ( +- 155.3 μs  )
pure/uniform/Word8                       mean 293.3 μs  ( +- 9.312 μs  )
pure/uniform/Word16                      mean 293.6 μs  ( +- 25.63 μs  )
pure/uniform/Word32                      mean 297.9 μs  ( +- 32.12 μs  )
pure/uniform/Word64                      mean 333.0 μs  ( +- 51.32 μs  )
pure/uniform/Word                        mean 310.5 μs  ( +- 32.72 μs  )
pure/uniform/Int8                        mean 289.4 μs  ( +- 7.407 μs  )
pure/uniform/Int16                       mean 287.8 μs  ( +- 8.539 μs  )
pure/uniform/Int32                       mean 288.0 μs  ( +- 12.18 μs  )
pure/uniform/Int64                       mean 286.9 μs  ( +- 8.415 μs  )
pure/uniform/Int                         mean 284.6 μs  ( +- 6.549 μs  )
pure/uniform/Char                        mean 11.56 ms  ( +- 111.0 μs  )
pure/uniform/Bool                        mean 281.3 μs  ( +- 4.134 μs  )
pure/uniform/CBool                       mean 286.4 μs  ( +- 10.23 μs  )
pure/uniform/CChar                       mean 279.2 μs  ( +- 3.491 μs  )
pure/uniform/CSChar                      mean 278.6 μs  ( +- 3.916 μs  )
pure/uniform/CUChar                      mean 278.3 μs  ( +- 3.818 μs  )
pure/uniform/CShort                      mean 279.8 μs  ( +- 5.486 μs  )
pure/uniform/CUShort                     mean 294.5 μs  ( +- 33.58 μs  )
pure/uniform/CInt                        mean 279.2 μs  ( +- 4.146 μs  )
pure/uniform/CUInt                       mean 288.5 μs  ( +- 10.21 μs  )
pure/uniform/CLong                       mean 286.3 μs  ( +- 8.502 μs  )
pure/uniform/CULong                      mean 301.9 μs  ( +- 34.07 μs  )
pure/uniform/CPtrdiff                    mean 284.5 μs  ( +- 8.010 μs  )
pure/uniform/CSize                       mean 279.8 μs  ( +- 4.210 μs  )
pure/uniform/CWchar                      mean 287.0 μs  ( +- 10.65 μs  )
pure/uniform/CSigAtomic                  mean 285.8 μs  ( +- 7.762 μs  )
pure/uniform/CLLong                      mean 294.4 μs  ( +- 12.35 μs  )
pure/uniform/CULLong                     mean 285.9 μs  ( +- 7.633 μs  )
pure/uniform/CIntPtr                     mean 292.7 μs  ( +- 12.71 μs  )
pure/uniform/CUIntPtr                    mean 292.6 μs  ( +- 9.984 μs  )
pure/uniform/CIntMax                     mean 284.5 μs  ( +- 11.39 μs  )
pure/uniform/CUIntMax                    mean 285.1 μs  ( +- 8.449 μs  )
Benchmark bench: FINISH

Optimisation

Note that Char is an outlier. It uses rejection sampling under the hood because not every Word32 is a valid Char. However, changing the method used from unsignedBitmaskWithRejectionM to unbiasedWordMult32 (as done in the second commit of this PR) approximately halves the time required to generate a Char:

Before

$ stack bench random:bench --ba 'pure/uniform/Char'
[...]
benchmarked pure/uniform/Char
time                 11.56 ms   (11.49 ms .. 11.62 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 11.44 ms   (11.39 ms .. 11.47 ms)
std dev              111.0 μs   (74.00 μs .. 177.8 μs)

After

$ stack bench random:bench --ba 'pure/uniform/Char'
[...]
benchmarked pure/uniform/Char
time                 6.238 ms   (6.205 ms .. 6.275 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.457 ms   (6.424 ms .. 6.494 ms)
std dev              105.3 μs   (90.97 μs .. 123.8 μs)

@idontgetoutmuch
Copy link
Copy Markdown
Owner

I'll put these in the changelog rather than the old school benchmarks.

@curiousleo
Copy link
Copy Markdown
Collaborator Author

@idontgetoutmuch wrote:

I'll put these in the changelog rather than the old school benchmarks.

Fantastic!

I've backported the benchmarks in their current state to master here: #111.

Those results are valid to compare to the ones posted here - same machine, same benchmarking code, etc. I had to benchmark with --quick because they took so long ... this means the accuracy is not as good, but this is about orders of magnitude anyway.

@curiousleo
Copy link
Copy Markdown
Collaborator Author

curiousleo commented Apr 23, 2020

I didn't say it in the description originally, but I just want to make it clear that the times in the description are for 1048576 iterations in each case. I just stole that number from @lehins' benchmarks. I'm thinking of changing it to 1000000 just to make it a little easier to deduce the single iteration runtime.

Edit: changed to 1000000.

@curiousleo curiousleo merged commit 37d0d4a into interface-to-performance Apr 23, 2020
@curiousleo curiousleo deleted the steal-benchmarks branch April 23, 2020 07:11
@lehins
Copy link
Copy Markdown
Collaborator

lehins commented Apr 27, 2020

@curiousleo 1048576 was no coincidence ;) I don't particularly care, it's just benchmarks, but just to give you a reason why 2^20=1048576 chosen. Thanks to this number you can easily translate from:

pure/uniform/Word8                       mean 293.3 μs
pure/uniform/Word16                      mean 293.6 μs

to it took 293.3 μs to generate 1MiB of data of Word8 type and it took 293.6 μs for 2MiB of data of Word16. I don't think anyone actually thinks in terms of a time it took to generate a single value, since that number is so low in these benchmarks.

@curiousleo
Copy link
Copy Markdown
Collaborator Author

@lehins ah cool, thanks for the explanation.

I don't think anyone actually thinks in terms of a time it took to generate a single value, since that number is so low in these benchmarks.

It is now :)

Good point though. I've actually reduces the number of iterations further from 1_000_000 to 100_000 in a follow-up PR so we can easily compare uniformR results as well, which takes longer.

It's not easy to communicate what these numbers mean. For Word8 etc. it makes sense to talk about "generated data". But does it make sense to talk about "4MiB of Int32s in some range"?

Anyway, I'm just thinking aloud here. Thanks for your explanation.

@idontgetoutmuch
Copy link
Copy Markdown
Owner

I was about to put the new benchmarks in the CHANGELOG but we have regressions

pure/uniformR/full/Word16  0.017675  0.000026  67,528%
pure/uniformR/full/Int16  0.019081  0.030798     -38%

There really should not be a regression though?

@curiousleo
Copy link
Copy Markdown
Collaborator Author

I was about to put the new benchmarks in the CHANGELOG but we have regressions

pure/uniformR/full/Word16  0.017675  0.000026  67,528%
pure/uniformR/full/Int16  0.019081  0.030798     -38%

There really should not be a regression though?

#103 fixes Int16 but introduces other regressions.

curiousleo added a commit that referenced this pull request May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants