Skip to content

Conversation

@codablock
Copy link

See individual commits. Most of these are related to using unordered maps/sets instead of ordered ones and are the result of multiple profiling sessions. All these small optimizations add up to about 10% of CPU usage on the message handler thread.

This PR also contains a backport of bitcoin#13176 as I've also noticed unexpected high CPU use for CNode::AddInventoryKnown.

laanwj and others added 8 commits April 11, 2019 09:00
… modulus with FastMod

9aac9f9 replace modulus with FastMod (Martin Ankerl)

Pull request description:

  Not sure if this is optimization is necessary, but anyway I have some spare time so here it is. This replaces the slow modulo operation with a much faster 64bit multiplication & shift. This works when the hash is uniformly distributed between 0 and 2^32-1. This speeds up the benchmark by a factor of about 1.3:

  ```
  RollingBloom, 5, 1500000, 3.73733, 4.97569e-07, 4.99002e-07, 4.98372e-07 # before
  RollingBloom, 5, 1500000, 2.86842, 3.81630e-07, 3.83730e-07, 3.82473e-07 # FastMod
  ```

  Be aware that this changes the internal data of the filter, so this should probably
  not be used for CBloomFilter because of interoperability problems.

Tree-SHA512: 04104f3fb09f56c9d14458a6aad919aeb0a5af944e8ee6a31f00e93c753e22004648c1cd65bf36752b6addec528d19fb665c27b955ce1666a85a928e17afa47a
In one of my profiling sessions with many InstantSend transactions
happening, calls into CSporkManager added up to about 1% of total CPU time.
This is easily avoidable by using unordered maps.
Due to the batched pruning, there is no need to maintain an ordered map
of values anymore. Only when nPruneAfterSize, there is a need to create
a temporary ordered vector of values to figure out what can be removed.
… on demand

CNode::AskFor will now push entries into an initially unordered vector
instead of an ordered multimap. Only when we later want to use vecAskFor in
SendMessages, we sort the vector.

The vector will actually be mostly sorted in most cases as insertion order
usually mimics the desired ordering. Only the last few entries might need
some shuffling around. Doing the sort on-demand should be less wasteful
then trying to maintain correct order all the time.
@codablock codablock added this to the 14.0 milestone Apr 11, 2019
UdjinM6
UdjinM6 previously approved these changes Apr 11, 2019
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

I'm surprised that tiny sporks and args map had such a noticeable impact tbh, sounds very strange to me.

@codablock
Copy link
Author

mapArgs was not that bad, I just did it while I was in optimization mode :)

The sporks is a different thing. We now perform a lot of IsNewInstantSendEnabled calls, which summed up to 1% of total time spent. I assume the actual reason is the temporary mapValueCounts map, which we should try to avoid in an upcoming refactoring. Replacing the maps with unordered maps just looked like the fastest/cheapest way to optimize it a tiny little bit.

@UdjinM6 UdjinM6 dismissed their stale review April 11, 2019 07:53

travis fails

This ensures that future backports that depends on limitedmap's ordering
conflict so that we are made aware of needed action.
@codablock
Copy link
Author

Added a few commits to fix Travis and also ensure that future backports don't go wrong.

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@UdjinM6 UdjinM6 merged commit 241f76f into dashpay:develop Apr 11, 2019
@codablock codablock deleted the pr_optimizations branch October 10, 2019 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants