Skip to content

Make HNSW merges faster #12440

@benwtrent

Description

@benwtrent

Description

It is well known that when merging segments, HNSW is not merged in linear time. We lose a fair bit of useful information as all the NSWs are effectively thrown away (except for a single graph). This means we have useful distributed work from each segment that is ignored.

Do we think its even possible to improve HNSW?

There is existing research around merging hierarchical graphs, but not specifically HNSW:

One other option that comes to my mind is keeping NSW scores and use them on merge. So, taking advantage of a FINGER type of idea. I am not 100% sold on using this during search as it increases memory usage. But, if we store these in a separate file (not loaded during search) but loaded during merge, we could reduce the number of calculations required.

Of course, all these types of changes require due diligence and do NOT obviate the need of a new algorithm in Lucene that is more page-fault friendly (IVFPQ, SPANN, DiskNN or something).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions