Skip to content

Conversation

@dbaarda
Copy link
Member

@dbaarda dbaarda commented Apr 20, 2020

This optimizes the rabinkarp_update() by correctly using unsigned constants and manually unrolling the loop for best performance. See the following for analysis that lead to this patch;

https://github.com/dbaarda/librsync-tests/blob/master/RESULTS.rst

dbaarda added 8 commits April 15, 2020 10:34
This adds rabinkarp.c copying rollsum.c's unrolled loop idea.
Change from a generic uint32_pow() to rabinkarp_pow() optimized using a lookup
table for getting powers of RABINKARP_MULT.

Use a DOMULT4() macro to unroll rabinkarp_update() in a way that means 4
multiplies could be piplined/parallelized if the hardware can do it. Each
multiply is not dependent on the results of the previous multiply.
A lot of testing of many different variants of the loop unrolling seems to
indicate this is the best variant of the loop-unrolling.
@dbaarda dbaarda merged commit bc5c439 into librsync:master Apr 27, 2020
@dbaarda dbaarda deleted the opt/rabinkarp1 branch April 27, 2020 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant