[FEATURE] Change array cache to HTAB cache for wal_block_numbers#26
Conversation
| 518 | 85024.00 | ||
| 340 | 87261.00 | ||
| 331 | 87796.00 | ||
| 518 | 85024.00 |
There was a problem hiding this comment.
I think we overlooked an issue in a previous PR (#18) where the regression tests for real[] column typed tables did not actually create an index.
It seems Varik fixed it there, which caused some of the changes.
Without the index we do exact ordering, which is always 100% accurate. With the index, we do approximate ordering so anomalies like the above are acceptible.
There was a problem hiding this comment.
Yes @Ngalstyan4 is right, this test was not working correctly, I have fixed it.
Ngalstyan4
left a comment
There was a problem hiding this comment.
Looks good! We should stress-test and benchmark this next
src/hnsw/cache.c
Outdated
|
|
||
| HTAB *cache_create() | ||
| { | ||
| MemoryContext ctx = AllocSetContextCreate(TopMemoryContext, "BlockNumer cache", ALLOCSET_DEFAULT_SIZES); |
There was a problem hiding this comment.
Is there a reason to next this memory context under the TopMemoryContext?
Why not use current memory context (see this)
| CREATE INDEX ON sift_base1k USING hnsw (v); | ||
| CREATE INDEX ON sift_base1k USING hnsw (v) WITH (dims=128); | ||
| psql:test/sql/hnsw_insert_array.sql:7: INFO: done init usearch index | ||
| psql:test/sql/hnsw_insert_array.sql:7: ERROR: Wrong number of dimensions: 128 instead of 3 expected |
There was a problem hiding this comment.
unrelated: is there a default for dimensions?
As a quick fix, there should not be!
In the long run we can:
- have the index be automatically sized when using pgvector's vector type
- Decide vector dimension on first insert into the table and enforce that following arrays have the same dimensionality
- Actually, in the same vain, is there a test case trying to insert differently sized vectors into the same tables to ensure that a proper error message is displayed?
There was a problem hiding this comment.
Yes there's a default dimension of 3, and also there are tests to check the size checking errors https://github.com/lanterndata/lanterndb/blob/main/test/sql/hnsw_type_checks.sql
| -> Sort | ||
| Sort Key: ((v <-> '{1,0,0,0,0,0,21,35,1,0,0,0,0,77,51,42,66,2,0,0,0,86,140,71,52,1,0,0,0,0,23,70,2,0,0,0,0,64,73,50,11,0,0,0,0,140,97,18,140,64,0,0,0,99,51,65,78,11,0,0,0,0,41,76,0,0,0,0,0,124,82,2,48,1,0,0,0,118,31,5,140,21,0,0,0,4,12,78,12,0,0,0,0,0,58,117,1,0,0,0,2,25,7,2,46,2,0,0,1,12,4,8,140,9,0,0,0,1,8,16,3,0,0,0,0,0,21,34}'::real[])) | ||
| -> Seq Scan on sift_base1k | ||
| (4 rows) |
There was a problem hiding this comment.
do you understand what these changes are?
| SELECT v FROM sift_base1k WHERE id <= 444 AND v IS NOT NULL; | ||
| INSERT 0 444 | ||
| SELECT count(*) from sift_base1k; | ||
| psql:test/sql/hnsw_insert_array.sql:82: INFO: cost estimate |
There was a problem hiding this comment.
Why and how is this using an index? We explicitly mark our index as one that only supports ordering. how does this work?
| @@ -176,7 +179,7 @@ void StoreExternalIndex(Relation index, | |||
| wal_retriever_block_numbers = palloc0(sizeof(BlockNumber) * num_added_vectors); | |||
There was a problem hiding this comment.
should this be pfreeed at the end of the function?
There was a problem hiding this comment.
Yes it is being pfree ed at the end of the function here
Issue
#21
Description
Changed
wal_retriever_block_numbersarray to PostgresHTAB(hash table) for caching block numbers.