Skip to content

sd_vector_disk#433

Merged
karasikov merged 13 commits intomasterfrom
mk/sd_vector_disk
Jan 22, 2023
Merged

sd_vector_disk#433
karasikov merged 13 commits intomasterfrom
mk/sd_vector_disk

Conversation

@karasikov
Copy link
Member

@karasikov karasikov commented Jan 11, 2023

SD vector with the vector of the least significant bits of set bits' positions stored on disk.
Takes only $\approx 2$ bits of RAM per set bit. The remaining $\lceil \log_2 n\rceil - \lceil \log_2 m\rceil$ bits are stored on disk.

Run benchmarks:

./benchmarks --benchmark_filter=BM_bv_query_sequential_sd_vector_disk_acce*

Results:

m_low in int_vector_buffer:

BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<10000>       5.26 us         5.17 us       100000 RAM=1.35269G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<3000>        4.25 us         4.22 us       160777 RAM=1.45588G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<1000>        53.0 us         18.2 us        32213 RAM=1.64712G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<500>         17.8 us         10.4 us       101077 RAM=1.43773G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<200>          376 us         51.3 us        11791 RAM=2.58662G

m_low in int_vector_mapper:

BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<10000>      0.235 us        0.233 us      2828471 RAM=1.5224G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<3000>       0.450 us        0.445 us      1135598 RAM=1.91674G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<1000>        1.17 us         1.17 us       545515 RAM=2.73401G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<500>         68.7 us         10.7 us        66802 RAM=2.24338G
^C

m_low and m_high in int_vector_mapper:

BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<10000>      0.276 us        0.274 us      2332459 RAM=1.52114G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<3000>       0.413 us        0.412 us      1216758 RAM=1.92005G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<1000>        1.11 us         1.11 us       521963 RAM=2.69745G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<500>         28.5 us         6.71 us       256609 RAM=2.83147G
BM_bv_query_random_sd_vector_disk_access_every_nth_bit_set<200>          197 us         32.7 us        26774 RAM=2.71836G

all with mmap:

BM_bv_query_random_sd_vector_mmap_access_every_nth_bit_set<10000>      0.285 us        0.284 us      2025029 RAM=1.52162G
BM_bv_query_random_sd_vector_mmap_access_every_nth_bit_set<3000>       0.396 us        0.394 us      1432762 RAM=1.92811G
BM_bv_query_random_sd_vector_mmap_access_every_nth_bit_set<1000>        1.68 us         1.67 us       334188 RAM=2.52352G
BM_bv_query_random_sd_vector_mmap_access_every_nth_bit_set<500>         4.12 us         4.09 us       134115 RAM=2.61806G
BM_bv_query_random_sd_vector_mmap_access_every_nth_bit_set<200>        430 us         57.3 us         8541 RAM=954.814M

all in RAM:

BM_bv_query_random_sd_vector_access_every_nth_bit_set<10000>      0.231 us        0.227 us      2777150 RAM=1.51786G
BM_bv_query_random_sd_vector_access_every_nth_bit_set<3000>       0.268 us        0.267 us      2635572 RAM=1.91844G
BM_bv_query_random_sd_vector_access_every_nth_bit_set<1000>       0.342 us        0.339 us      2106042 RAM=2.89771G
BM_bv_query_random_sd_vector_access_every_nth_bit_set<500>        0.373 us        0.371 us      1886208 RAM=4.23003G
BM_bv_query_random_sd_vector_access_every_nth_bit_set<200>        0.581 us        0.566 us      1703989 RAM=6.93669G

The times fluctuate quite a bit, it would be good to also benchmark it on the cluster.

@karasikov karasikov requested a review from hmusta January 12, 2023 17:12
@karasikov
Copy link
Member Author

karasikov commented Jan 22, 2023

I figured out how to easily load all vectors with memory mapping (will implement in another PR), so I'm only gonna keep here the constructor for sd_vector with streaming.

@karasikov karasikov merged commit fe53f55 into master Jan 22, 2023
@karasikov karasikov deleted the mk/sd_vector_disk branch January 22, 2023 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants