Skip to content

Implement new vector index using JVector library #2529

@robfrank

Description

@robfrank

Actually, in ArcadeDB, there's a vector index implementation based on hnswlib that lacks some features and is not very integrated in Arcade (e.g.: no SQL support).

Jvector is the leading library to implement embedded vector search engine.

The plan is to replace the existing implementation with the new one based on Jvector and provide a fully integrated support with ArcadeDB engine and transactions.

  • implement JVectorIndex
  • provide SQL support: CREATE INDEX IF NOT EXISTS ON VectorDocument (id, embedding) JVECTOR
  • enhance SQL with json options to fine tune the index: CREATE INDEX IF NOT EXISTS ON VectorDocument (id, embedding) JVECTOR options {option1: value}
  • from SQL support for HNSW index #1490: implement functions

For the sake of completeness, the #1490 is copied here:

We need some new function/method to expose the following methods from the index:

  • findNeighborsFromVector(TVector vector, int k): find max K neighbors from a vector of embeddings
  • findNeighborsFromId(TID id, int k): find max K neighbors starting from an id (indexed with the underlying LSMTree)
  • findNeighborsFromVertex(Vertex start, int k): find max K neighbors starting from a vertex

The easiest way is to create 3 new SQL functions to be used from SQL. Example:

select findNeighborsFromVector( "Word[name,vector]", [1,2,3,4,5,6], 10 )

The Java API returns a List<Pair<Identifiable, ? extends Number>>, with the vertex rid as the first argument and a number (float, double or whatever you pick at index creation) with the proximity. Ordered by proximity, the closest first.

With SQL it must be wrapped in a Result with "vertex" and "proximity" properties:

+------------------+---------------------+
| VERTEX           |           PROXIMITY |
+------------------+---------------------+
| #13:4            |                0.12 |
| #19:10           |                0.19 |
+------------------+---------------------+

So you can also cross the graph starting with embeddings:

select expand( vertex ) from (
  select findNeighborsFromVector( "Word[name,vector]", [1,2,3,4,5,6], 10 )
) where proximity < 0.5

To return all the neighbors with proximity less than 0.5 from the vector.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions