-
-
Notifications
You must be signed in to change notification settings - Fork 94
Description
Actually, in ArcadeDB, there's a vector index implementation based on hnswlib that lacks some features and is not very integrated in Arcade (e.g.: no SQL support).
Jvector is the leading library to implement embedded vector search engine.
The plan is to replace the existing implementation with the new one based on Jvector and provide a fully integrated support with ArcadeDB engine and transactions.
- implement JVectorIndex
- provide SQL support:
CREATE INDEX IF NOT EXISTS ON VectorDocument (id, embedding) JVECTOR - enhance SQL with json options to fine tune the index:
CREATE INDEX IF NOT EXISTS ON VectorDocument (id, embedding) JVECTOR options {option1: value} - from SQL support for HNSW index #1490: implement functions
For the sake of completeness, the #1490 is copied here:
We need some new function/method to expose the following methods from the index:
findNeighborsFromVector(TVector vector, int k): find max K neighbors from a vector of embeddingsfindNeighborsFromId(TID id, int k): find max K neighbors starting from an id (indexed with the underlying LSMTree)findNeighborsFromVertex(Vertex start, int k): find max K neighbors starting from a vertex
The easiest way is to create 3 new SQL functions to be used from SQL. Example:
select findNeighborsFromVector( "Word[name,vector]", [1,2,3,4,5,6], 10 )The Java API returns a List<Pair<Identifiable, ? extends Number>>, with the vertex rid as the first argument and a number (float, double or whatever you pick at index creation) with the proximity. Ordered by proximity, the closest first.
With SQL it must be wrapped in a Result with "vertex" and "proximity" properties:
+------------------+---------------------+
| VERTEX | PROXIMITY |
+------------------+---------------------+
| #13:4 | 0.12 |
| #19:10 | 0.19 |
+------------------+---------------------+
So you can also cross the graph starting with embeddings:
select expand( vertex ) from (
select findNeighborsFromVector( "Word[name,vector]", [1,2,3,4,5,6], 10 )
) where proximity < 0.5To return all the neighbors with proximity less than 0.5 from the vector.