Skip to content

Conversation

@dbaarda
Copy link
Member

@dbaarda dbaarda commented Apr 30, 2020

Use the upper bits of the hash as the bloom filter index instead of using the lower bits used for the hashtable index.

This means we don't use the same index for the bloom filter and hashtable which ensures we don't always start probing the hashtable after bloom-filter hits at known non-empty entries. This is similar to using a "different hash" for each "k" of a bloom filter. The hashtable probes have a chance of immediately hitting an empty entry, significantly reducing the number of weaksum compares.

Also add tmask and bshift attributes to hashtable_t so we don't need to recalculate them for every probe.

dbaarda added 3 commits April 30, 2020 10:05
Mix the hash's upper 16bits with the lower 16bits as a different hash for the
bloom filter. This means bloomfilter hits don't necessarily align with
occupied hashtable entries, possibly avoiding multiple hashtable probes to
detect a hashtable miss.
Add and use tmask attribute for getting the hashtable index from a hash.

Add and use bshift attribute for getting the bloomfilter bit index from a
hash. This means we use the upper bits of the hash instead of bit-mixing and
masking.
For a minimum of 1 and an empty file, the mask is empty, resulting in a bshift
with as many bits as unsigned hash. In C shifting by all the bits is
undefined.
@dbaarda dbaarda merged commit 9cf9ca3 into librsync:master Apr 30, 2020
@dbaarda dbaarda deleted the opt/hashtable3 branch May 14, 2020 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant