[FEATURE] Change array cache to HTAB cache for wal_block_numbers by var77 · Pull Request #26 · lanterndata/lantern

var77 · 2023-08-05T22:41:37Z

Issue

Description

Changed wal_retriever_block_numbers array to Postgres HTAB (hash table) for caching block numbers.

davkhech · 2023-08-06T10:52:40Z

test/expected/hnsw_insert_array.out

- 518 | 85024.00
 340 | 87261.00
 331 | 87796.00
+ 518 | 85024.00


is this fine?

I think we overlooked an issue in a previous PR (#18) where the regression tests for real[] column typed tables did not actually create an index.
It seems Varik fixed it there, which caused some of the changes.

Without the index we do exact ordering, which is always 100% accurate. With the index, we do approximate ordering so anomalies like the above are acceptible.

Yes @Ngalstyan4 is right, this test was not working correctly, I have fixed it.

Ngalstyan4

Looks good! We should stress-test and benchmark this next

Ngalstyan4 · 2023-08-06T10:40:12Z

src/hnsw/cache.c

+
+HTAB *cache_create()
+{
+    MemoryContext ctx = AllocSetContextCreate(TopMemoryContext, "BlockNumer cache", ALLOCSET_DEFAULT_SIZES);


Is there a reason to next this memory context under the TopMemoryContext?
Why not use current memory context (see this)

Ngalstyan4 · 2023-08-06T10:50:38Z

test/expected/hnsw_insert_array.out

-CREATE INDEX ON sift_base1k USING hnsw (v);
+CREATE INDEX ON sift_base1k USING hnsw (v) WITH (dims=128);
 psql:test/sql/hnsw_insert_array.sql:7: INFO:  done init usearch index
-psql:test/sql/hnsw_insert_array.sql:7: ERROR:  Wrong number of dimensions: 128 instead of 3 expected


unrelated: is there a default for dimensions?
As a quick fix, there should not be!
In the long run we can:

have the index be automatically sized when using pgvector's vector type

Decide vector dimension on first insert into the table and enforce that following arrays have the same dimensionality

Actually, in the same vain, is there a test case trying to insert differently sized vectors into the same tables to ensure that a proper error message is displayed?

Yes there's a default dimension of 3, and also there are tests to check the size checking errors https://github.com/lanterndata/lanterndb/blob/main/test/sql/hnsw_type_checks.sql

Ngalstyan4 · 2023-08-06T10:51:22Z

test/expected/hnsw_insert_array.out

-   ->  Sort
-         Sort Key: ((v <-> '{1,0,0,0,0,0,21,35,1,0,0,0,0,77,51,42,66,2,0,0,0,86,140,71,52,1,0,0,0,0,23,70,2,0,0,0,0,64,73,50,11,0,0,0,0,140,97,18,140,64,0,0,0,99,51,65,78,11,0,0,0,0,41,76,0,0,0,0,0,124,82,2,48,1,0,0,0,118,31,5,140,21,0,0,0,4,12,78,12,0,0,0,0,0,58,117,1,0,0,0,2,25,7,2,46,2,0,0,1,12,4,8,140,9,0,0,0,1,8,16,3,0,0,0,0,0,21,34}'::real[]))
-         ->  Seq Scan on sift_base1k
-(4 rows)


do you understand what these changes are?

Ngalstyan4 · 2023-08-06T10:57:40Z

test/expected/hnsw_insert_array.out

 SELECT v FROM sift_base1k WHERE id <= 444 AND v IS NOT NULL;
 INSERT 0 444
 SELECT count(*) from sift_base1k;
+psql:test/sql/hnsw_insert_array.sql:82: INFO:  cost estimate


Why and how is this using an index? We explicitly mark our index as one that only supports ordering. how does this work?

src/hnsw/external_index.c

Ngalstyan4 · 2023-08-06T11:03:35Z

src/hnsw/external_index.c

@@ -176,7 +179,7 @@ void StoreExternalIndex(Relation        index,
    wal_retriever_block_numbers = palloc0(sizeof(BlockNumber) * num_added_vectors);


should this be pfreeed at the end of the function?

Yes it is being pfree ed at the end of the function here

src/hnsw/external_index.c

src/hnsw/cache.c

* Check memory usage before running model. references #26 * Fix checks for GPU #26 * Add info message #26 * Print more informative error messages * Bump version * Refactor naming * Fix return type * Bump versions

Change array cache to HMAP cache for wal_block_numbers

67473f8

var77 requested a review from Ngalstyan4 August 5, 2023 22:46

var77 changed the title ~~[FEATURE] Change array cache to HMAP cache for wal_block_numbers~~ [FEATURE] Change array cache to HTAB cache for wal_block_numbers Aug 5, 2023

davkhech reviewed Aug 6, 2023

View reviewed changes

Ngalstyan4 reviewed Aug 6, 2023

View reviewed changes

var77 added 2 commits August 6, 2023 18:11

Fix memory context deletion of cache

431fcab

Merge with main

f8556b7

var77 marked this pull request as ready for review August 6, 2023 14:19

var77 requested review from Ngalstyan4 and davkhech August 6, 2023 14:27

Ngalstyan4 merged commit 4ac6689 into lanterndata:main Aug 6, 2023

		@@ -176,7 +179,7 @@ void StoreExternalIndex(Relation index,
		wal_retriever_block_numbers = palloc0(sizeof(BlockNumber) * num_added_vectors);

Conversation

var77 commented Aug 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Description

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ngalstyan4 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

var77 commented Aug 5, 2023 •

edited

Loading