Skip to content

Add hashmaps/microbenchmarks/insertHeavy benchmark#338

Merged
utdemir merged 3 commits intomasterfrom
ud/hashmap-insert-benchmark
Jul 24, 2021
Merged

Add hashmaps/microbenchmarks/insertHeavy benchmark#338
utdemir merged 3 commits intomasterfrom
ud/hashmap-insert-benchmark

Conversation

@utdemir
Copy link
Copy Markdown
Contributor

@utdemir utdemir commented Jul 23, 2021

This PR adds a benchmark for Data.HashMap.Mutable.Linear which inserts a bunch of keys in random order. Here're some details of the benchmark:

  • The keys are unique, so there are no updates.
  • The hashmap is initialized to be twice as big as the input size to avoid resizes.
  • The keys are newtype wrappers around Int's for which the hash value is salted with a large number.

Here's the result of the benchmark on GHC HEAD (378c0bba7d):

$ cabal bench --benchmark-options 'hashmaps/microbenchmarks/insertHeavy'
...
Benchmark mutable-data: RUNNING...
benchmarked hashmaps/microbenchmarks/insertHeavy/Data.HashMap.Mutable.Linear
time                 45.20 ms   (44.88 ms .. 45.62 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 44.56 ms   (44.11 ms .. 44.90 ms)
std dev              778.4 μs   (502.6 μs .. 1.047 ms)

benchmarked hashmaps/microbenchmarks/insertHeavy/Data.HashMap.Strict
time                 34.17 ms   (33.29 ms .. 34.97 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 34.37 ms   (33.89 ms .. 35.07 ms)
std dev              1.218 ms   (857.9 μs .. 1.675 ms)
variance introduced by outliers: 11% (moderately inflated)

One thing I found when I'm implementing this is that, when you remove the salt from the hash function (and therefore making it identity for our Int keys) we are a lot faster. The fact that it gets faster makes sense, since there are no hash collisions (the largest Int is smaller than our capacity in this benchmark). But the fact that it gets almost twice as fast is curious, since I don't expect that much collisions. I had a look at our probe sequence lengths and found that most of the keys indeed did not have collisions; which means that even a few collisions slows downs our implementation disproportionately.


Also, this has two minor improvements to the existing benchmark suite (see the individual commits). Another note about the implementation is that it looks a bit different than the existing HashMap benchmarks. It has two reasons:

  • The benchmarks I'm planning to write are much smaller than the existing ones (i.e. focusing on just one function), to reduce the surface area.
  • When writing the benchmarks I find it useful to have the functions I'm comparing closer together. The existing benchmarks were grouped the opposite way (grouped by the hashmap implementation). Another solution would be to group the existing benchmarks the way I want, but I'm erring the side of smaller diffs.

Copy link
Copy Markdown
Member

@aspiwack aspiwack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The influence of the salt is rather unexpected indeed.

For reference, hashWithSalt salt i (for i an integer) is implemented as (salt * 16777619) xor i.

@utdemir utdemir merged commit 5b755ac into master Jul 24, 2021
@utdemir utdemir deleted the ud/hashmap-insert-benchmark branch July 24, 2021 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants