Skip to content

Add common::bitpacking#5487

Merged
xzfc merged 4 commits intodevfrom
links-comp-common-bitpacking
Nov 26, 2024
Merged

Add common::bitpacking#5487
xzfc merged 4 commits intodevfrom
links-comp-common-bitpacking

Conversation

@xzfc
Copy link
Member

@xzfc xzfc commented Nov 20, 2024

This PR adds a new module common::bitpacking with two structs: BitPacker and BitUnpacker. They would be used for the HNSW graph links compression in subsequent PRs.

Comparison with other implementations:

  • bitpacking: it's fast, SIMD-optimized, but application is limited: blocks should be of size of 32/128/256 integers, all stored integers within a block should have the same bitsize.
  • bitstream_io: the idea is somewhat similar to this PR, but this crate resulted to be slower for links compression/in-memory use-case.
  • tantivy_bitpacker: optimized for a random access reads within a packed array.

The most important function is BitUnpacker::read() as it would be used in a hot loop. The inner buffer size is 8 bytes (a single u64) and it's optimized to read by whole 8 bytes at once.

The intended use-case for this module would look like the following (example numbers are arbitrary):

  • Store header: 5 bit.
  • Store first 32 values, each of size 12 bits. (e.g. sorted/delta-encoded for the main graph).
  • Store last 20 values, each of size 15 bits. (e.g. additional links).
    Total size of the example: $\lceil \frac{5 + 32 \times 12 + 20 \times 15}{8} \rceil = 87 \ \text{bytes}$.

@xzfc xzfc added this to the hnsw-links-compression milestone Nov 20, 2024
Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident I understand all the nitty-gritty. However, I'm assuming that you've already extensively tested this. Based on a quick skim I don't have any remarks, and I like that you split it into a sub crate.

}

#[inline]
pub fn push(&mut self, value: u32, bits: u8) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please provide some examples of how to use this function in real-life scenario. For instance, where bits is coming from? I see in the provided tests, that bits are used to generate value. Does it means we can derive bits from value inside the push function?

Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToDo: comment that the Bitpacker abstraction expects some external entity to keep track of value bit lengths

@xzfc xzfc force-pushed the links-comp-common-bitpacking branch from d23a0c1 to 69ef2fb Compare November 26, 2024 14:06
@xzfc xzfc force-pushed the links-comp-common-bitpacking branch from 69ef2fb to 87daebc Compare November 26, 2024 14:12
@xzfc xzfc merged commit bc98e90 into dev Nov 26, 2024
@xzfc xzfc deleted the links-comp-common-bitpacking branch November 26, 2024 14:46
timvisee pushed a commit that referenced this pull request Dec 9, 2024
Co-authored-by: generall <generall@thinkstation.fritz.box>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants