I've recently been reading on succinct data structures literature, and I am wondering if they might be applicable to rowan. The tagline of SD is to represent data using approximately as few bits as entropy allows, but with keeping all the operations fast.
In particular, for trees, we generally use 2 * n * sizeof<usize> bytes (parent/children pointers). SD allows using roughly 2n bits, while still allowing for efficient parent/child queries. This works due to the following tricks:
- bitvectors can be represented very efficiently (you can pack 64 bits into a single word)
- working with bitmasks is fast (popcount + precomputed tables)
- by spending a few extra bytes, it's possible to augment bitvector with fast prefix-sum datastructures (rank & select)
- it's possible to encode a tree as a bitvec, using couple of bits per node, where parent/child relations are expressible via rank/select
It would be interesting to see if it makes sense to use something like this for rowan.
Some reasons to do this:
- using fewer bits for storing tree topology means that larger trees will fit into CPU caches, potentially improving performance
- trees are heavy-weight, reducing RAM seems like a good idea
- algorithmic complexity of operations becomes independent of the topology of the tree. Ie, the tree can be arbitrary deep or arbitrary wide
Some reasons not to do this:
- performance will probably be worse
- works only for read-only trees
- high complexity of the implementation
I've recently been reading on succinct data structures literature, and I am wondering if they might be applicable to rowan. The tagline of SD is to represent data using approximately as few bits as entropy allows, but with keeping all the operations fast.
In particular, for trees, we generally use
2 * n * sizeof<usize>bytes (parent/children pointers). SD allows using roughly2nbits, while still allowing for efficient parent/child queries. This works due to the following tricks:It would be interesting to see if it makes sense to use something like this for
rowan.Some reasons to do this:
Some reasons not to do this: