Skip to content

SQL support for HNSW index #1490

@lvca

Description

@lvca

ArcadeDB's HNSW indes is pretty powerful, but the lack of SQL support makes it hard to use via API.

We need some new function/method to expose the following methods from the index:

  • findNeighborsFromVector(TVector vector, int k): find max K neighbors from a vector of embeddings
  • findNeighborsFromId(TID id, int k): find max K neighbors starting from an id (indexed with the underlying LSMTree)
  • findNeighborsFromVertex(Vertex start, int k): find max K neighbors starting from a vertex

The easiest way is to create 3 new SQL functions to be used from SQL. Example:

select findNeighborsFromVector( "Word[name,vector]", [1,2,3,4,5,6], 10 )

The Java API returns a List<Pair<Identifiable, ? extends Number>>, with the vertex rid as the first argument and a number (float, double or whatever you pick at index creation) with the proximity. Ordered by proximity, the closest first.

With SQL it must be wrapped in a Result with "vertex" and "proximity" properties:

+------------------+---------------------+
| VERTEX           |           PROXIMITY |
+------------------+---------------------+
| #13:4            |                0.12 |
| #19:10           |                0.19 |
+------------------+---------------------+

So you can also cross the graph starting with embeddings:

select expand( vertex ) from (
  select findNeighborsFromVector( "Word[name,vector]", [1,2,3,4,5,6], 10 )
) where proximity < 0.5

To return all the neighbors with proximity less than 0.5 from the vector.

Metadata

Metadata

Assignees

Labels

invalidThis doesn't seem right

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions