-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Description
The known problems with the hash table:
- huge allocations makes malloc/free freeze the system (add resize-hashtables-enabled config to control tryResizeHashTables in serverCron() #8611)
- even more memory is used while incremental rehashing is ongoing
- (anything else?)
It has been discussed to replace the hash table with a rax, but I think a HAMT is a better idea.
Hash Array Mapped Trie (HAMT) is a kind of nested hash table or hash tree.
It's a hash table of (a fixed number, typically) 32 slots, where each slot is either empty, an entry or (on
collition) a subnode with the same structure. For each level, 5 bits of the
hashed key is used as an index into the 32 slots. To save memory, each 32-slot
node is stored in a compact way using a bitmap to flag the existence of each of
the 32 potential slots.
- Introduction: https://en.wikipedia.org/wiki/Hash_array_mapped_trie
- Detailed paper: https://infoscience.epfl.ch/record/64398/files/idealhashtrees.pdf
Benefits
- No rehashing needed.
- No huge allocations.
- It's automatically balanced (if our hash function is good, which it is).
- Speed comparable to classic dict.
- Memory usage less than classic dict (roughly 2 words less per entry).
- SCAN and random key are simple to implement.
- All operations are safe during iteration.
- Fairly simple design.
Why not a Radix Tree (RAX)?
- The rax needs much memory for large keys without a shared prefix.
- There are no known SCAN and fair random key selection algorithms.
The HAMT is similar to a RAX, but instead of the key itself, the hashed key is
used as the path in the tree. Since a hashed key is a number, it can also be used
as a cursor. A random number used as a cursor picks a random key.