Skip to content

Consider using succinct data structures for read-only trees #107

@matklad

Description

@matklad

I've recently been reading on succinct data structures literature, and I am wondering if they might be applicable to rowan. The tagline of SD is to represent data using approximately as few bits as entropy allows, but with keeping all the operations fast.

In particular, for trees, we generally use 2 * n * sizeof<usize> bytes (parent/children pointers). SD allows using roughly 2n bits, while still allowing for efficient parent/child queries. This works due to the following tricks:

  • bitvectors can be represented very efficiently (you can pack 64 bits into a single word)
  • working with bitmasks is fast (popcount + precomputed tables)
  • by spending a few extra bytes, it's possible to augment bitvector with fast prefix-sum datastructures (rank & select)
  • it's possible to encode a tree as a bitvec, using couple of bits per node, where parent/child relations are expressible via rank/select

It would be interesting to see if it makes sense to use something like this for rowan.

Some reasons to do this:

  • using fewer bits for storing tree topology means that larger trees will fit into CPU caches, potentially improving performance
  • trees are heavy-weight, reducing RAM seems like a good idea
  • algorithmic complexity of operations becomes independent of the topology of the tree. Ie, the tree can be arbitrary deep or arbitrary wide

Some reasons not to do this:

  • performance will probably be worse
  • works only for read-only trees
  • high complexity of the implementation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions