Skip to content

[NEW] Replace dict's hash table with HAMT #8698

@zuiderkwast

Description

@zuiderkwast

The known problems with the hash table:

It has been discussed to replace the hash table with a rax, but I think a HAMT is a better idea.

Hash Array Mapped Trie (HAMT) is a kind of nested hash table or hash tree.

It's a hash table of (a fixed number, typically) 32 slots, where each slot is either empty, an entry or (on
collition) a subnode with the same structure. For each level, 5 bits of the
hashed key is used as an index into the 32 slots. To save memory, each 32-slot
node is stored in a compact way using a bitmap to flag the existence of each of
the 32 potential slots.

Benefits

  • No rehashing needed.
  • No huge allocations.
  • It's automatically balanced (if our hash function is good, which it is).
  • Speed comparable to classic dict.
  • Memory usage less than classic dict (roughly 2 words less per entry).
  • SCAN and random key are simple to implement.
  • All operations are safe during iteration.
  • Fairly simple design.

Why not a Radix Tree (RAX)?

  • The rax needs much memory for large keys without a shared prefix.
  • There are no known SCAN and fair random key selection algorithms.

The HAMT is similar to a RAX, but instead of the key itself, the hashed key is
used as the path in the tree. Since a hashed key is a number, it can also be used
as a cursor. A random number used as a cursor picks a random key.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions