Skip to content

[FEATURE]: support real[] array type for hnsw index#18

Merged
Ngalstyan4 merged 9 commits intolanterndata:mainfrom
var77:feature/support-real-array-type
Aug 4, 2023
Merged

[FEATURE]: support real[] array type for hnsw index#18
Ngalstyan4 merged 9 commits intolanterndata:mainfrom
var77:feature/support-real-array-type

Conversation

@var77
Copy link
Copy Markdown
Collaborator

@var77 var77 commented Aug 3, 2023

Description

With the PR, we are making pgvector dependency optional by adding support for real[] type.

  • Added Dockerfile.dev which will install the extension in debug build and also install gdb which will help us on debugging.
  • Updated sql files to enable l2_distance function for real[] type, create <-> operator for real[] type and also conditionally enable operators for vector type only if the type exsits.
  • Added options dims on hnsw index, which is only required if index is being built on real[] type. This will indicate the dimension of the vectors..
  • Added l2_distance function in hnsw.c which will calculate l2 distance between 2 arrays.
  • Added GetIndexDataType function which will determine the data type in which the index was built from its oid and return value of HnswDataType enum (REAL_ARRAY, VECTOR, UNKNOWN). This will help us in cases where we need to change behavior based on index data type.
  • Added GetHnswIndexDimensions function which will check if indexed data type is REAL_ARRAY return the dimension from options, if it is vector it will return from pgvector exposed metadata.
  • Added CheckHnswIndexDimensions function, which will work only for REAL_ARRAY types and check if the length of provided array is equal to the dimension specified in index. It will throw error in case of mismatch.
  • Added hnsw_insert_array.sql test case which is copy of hnsw_insert.sql test case (with some exclusions). This will build the index on type real[] instead of vector, and test against it.

TODO

  • Add more test cases to cover type chekings

@var77 var77 requested a review from Ngalstyan4 August 3, 2023 21:07
Copy link
Copy Markdown
Contributor

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
some nits, otherwise looks good.

@var77 var77 marked this pull request as ready for review August 4, 2023 16:45
@Ngalstyan4 Ngalstyan4 changed the title [FEATURE]: support real array type [FEATURE]: support real[] array type for hnsw index Aug 4, 2023
@Ngalstyan4 Ngalstyan4 merged commit ffcfe79 into lanterndata:main Aug 4, 2023
var77 added a commit that referenced this pull request Oct 8, 2024
* Added lantern-cli binary and cli option for embedding generation

* Update CI/CD to build CLI package

* Fix CI env var name

* Improve error handling and logging, update README

* Make image downloading parallel, update README

* Add data using clone

* Add more status logs

* Fix error messages for image downloads

* Update README

* Fix output for bge models

* Get CLS embeddings from bert models

* Refactor and cleanup build/package script

* Add schema support, make pk field generic

* Update README, bump version

* Get approximate count of rows

* Add README.md and LICENSE into release package

* Update README

* Change lantern-cli name to contain architecture and platform

* Add schema in table size estimation, make input column value optional
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants