Skip to content

[FEATURE] Change array cache to HTAB cache for wal_block_numbers#26

Merged
Ngalstyan4 merged 3 commits intolanterndata:mainfrom
var77:var77/index-cache-improvements
Aug 6, 2023
Merged

[FEATURE] Change array cache to HTAB cache for wal_block_numbers#26
Ngalstyan4 merged 3 commits intolanterndata:mainfrom
var77:var77/index-cache-improvements

Conversation

@var77
Copy link
Copy Markdown
Collaborator

@var77 var77 commented Aug 5, 2023

Issue

#21

Description

Changed wal_retriever_block_numbers array to Postgres HTAB (hash table) for caching block numbers.

@var77 var77 requested a review from Ngalstyan4 August 5, 2023 22:46
@var77 var77 changed the title [FEATURE] Change array cache to HMAP cache for wal_block_numbers [FEATURE] Change array cache to HTAB cache for wal_block_numbers Aug 5, 2023
518 | 85024.00
340 | 87261.00
331 | 87796.00
518 | 85024.00
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this fine?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we overlooked an issue in a previous PR (#18) where the regression tests for real[] column typed tables did not actually create an index.
It seems Varik fixed it there, which caused some of the changes.

Without the index we do exact ordering, which is always 100% accurate. With the index, we do approximate ordering so anomalies like the above are acceptible.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @Ngalstyan4 is right, this test was not working correctly, I have fixed it.

Copy link
Copy Markdown
Contributor

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! We should stress-test and benchmark this next

src/hnsw/cache.c Outdated

HTAB *cache_create()
{
MemoryContext ctx = AllocSetContextCreate(TopMemoryContext, "BlockNumer cache", ALLOCSET_DEFAULT_SIZES);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to next this memory context under the TopMemoryContext?
Why not use current memory context (see this)

CREATE INDEX ON sift_base1k USING hnsw (v);
CREATE INDEX ON sift_base1k USING hnsw (v) WITH (dims=128);
psql:test/sql/hnsw_insert_array.sql:7: INFO: done init usearch index
psql:test/sql/hnsw_insert_array.sql:7: ERROR: Wrong number of dimensions: 128 instead of 3 expected
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated: is there a default for dimensions?
As a quick fix, there should not be!
In the long run we can:

  • have the index be automatically sized when using pgvector's vector type
  • Decide vector dimension on first insert into the table and enforce that following arrays have the same dimensionality
  • Actually, in the same vain, is there a test case trying to insert differently sized vectors into the same tables to ensure that a proper error message is displayed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there's a default dimension of 3, and also there are tests to check the size checking errors https://github.com/lanterndata/lanterndb/blob/main/test/sql/hnsw_type_checks.sql

-> Sort
Sort Key: ((v <-> '{1,0,0,0,0,0,21,35,1,0,0,0,0,77,51,42,66,2,0,0,0,86,140,71,52,1,0,0,0,0,23,70,2,0,0,0,0,64,73,50,11,0,0,0,0,140,97,18,140,64,0,0,0,99,51,65,78,11,0,0,0,0,41,76,0,0,0,0,0,124,82,2,48,1,0,0,0,118,31,5,140,21,0,0,0,4,12,78,12,0,0,0,0,0,58,117,1,0,0,0,2,25,7,2,46,2,0,0,1,12,4,8,140,9,0,0,0,1,8,16,3,0,0,0,0,0,21,34}'::real[]))
-> Seq Scan on sift_base1k
(4 rows)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you understand what these changes are?

SELECT v FROM sift_base1k WHERE id <= 444 AND v IS NOT NULL;
INSERT 0 444
SELECT count(*) from sift_base1k;
psql:test/sql/hnsw_insert_array.sql:82: INFO: cost estimate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why and how is this using an index? We explicitly mark our index as one that only supports ordering. how does this work?

@@ -176,7 +179,7 @@ void StoreExternalIndex(Relation index,
wal_retriever_block_numbers = palloc0(sizeof(BlockNumber) * num_added_vectors);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be pfreeed at the end of the function?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is being pfree ed at the end of the function here

@var77 var77 marked this pull request as ready for review August 6, 2023 14:19
@var77 var77 requested review from Ngalstyan4 and davkhech August 6, 2023 14:27
@Ngalstyan4 Ngalstyan4 merged commit 4ac6689 into lanterndata:main Aug 6, 2023
var77 added a commit that referenced this pull request Oct 8, 2024
* Check memory usage before running model. references #26

* Fix checks for GPU #26

* Add info message #26

* Print more informative error messages

* Bump version

* Refactor naming

* Fix return type

* Bump versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants