[FEATURE]: support real[] array type for hnsw index#18
Merged
Ngalstyan4 merged 9 commits intolanterndata:mainfrom Aug 4, 2023
Merged
[FEATURE]: support real[] array type for hnsw index#18Ngalstyan4 merged 9 commits intolanterndata:mainfrom
Ngalstyan4 merged 9 commits intolanterndata:mainfrom
Conversation
Ngalstyan4
reviewed
Aug 4, 2023
Contributor
Ngalstyan4
left a comment
There was a problem hiding this comment.
Great work!
some nits, otherwise looks good.
var77
commented
Aug 4, 2023
var77
commented
Aug 4, 2023
var77
commented
Aug 4, 2023
var77
commented
Aug 4, 2023
var77
added a commit
that referenced
this pull request
Oct 8, 2024
* Added lantern-cli binary and cli option for embedding generation * Update CI/CD to build CLI package * Fix CI env var name * Improve error handling and logging, update README * Make image downloading parallel, update README * Add data using clone * Add more status logs * Fix error messages for image downloads * Update README * Fix output for bge models * Get CLS embeddings from bert models * Refactor and cleanup build/package script * Add schema support, make pk field generic * Update README, bump version * Get approximate count of rows * Add README.md and LICENSE into release package * Update README * Change lantern-cli name to contain architecture and platform * Add schema in table size estimation, make input column value optional
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
With the PR, we are making
pgvectordependency optional by adding support forreal[]type.Dockerfile.devwhich will install the extension in debug build and also installgdbwhich will help us on debugging.sqlfiles to enablel2_distancefunction forreal[]type, create<->operator forreal[]type and also conditionally enable operators forvectortype only if the type exsits.dimsonhnswindex, which is only required if index is being built onreal[]type. This will indicate the dimension of the vectors..l2_distancefunction inhnsw.cwhich will calculate l2 distance between 2 arrays.GetIndexDataTypefunction which will determine the data type in which the index was built from its oid and return value ofHnswDataTypeenum (REAL_ARRAY,VECTOR,UNKNOWN). This will help us in cases where we need to change behavior based on index data type.GetHnswIndexDimensionsfunction which will check if indexed data type isREAL_ARRAYreturn the dimension from options, if it is vector it will return frompgvectorexposed metadata.CheckHnswIndexDimensionsfunction, which will work only forREAL_ARRAYtypes and check if the length of provided array is equal to the dimension specified in index. It will throw error in case of mismatch.hnsw_insert_array.sqltest case which is copy ofhnsw_insert.sqltest case (with some exclusions). This will build the index on typereal[]instead of vector, and test against it.TODO