Skip to content

HNSW Vector Index entryPoint Not Persisted on Initial Creation #2801

@tae898

Description

@tae898

Summary

When creating a new HNSW vector index and adding vertices in a single transaction, the entryPoint field is not persisted to the .hnswidx metadata file. The index works correctly during the creation session, but the empty entryPoint may cause issues when reopening the database.

Root Cause

The bug is in the onAfterCommit() hook in HnswVectorIndex.java (line 229):

@Override
public void onAfterCommit() {
  if (entryPoint != null && !entryPoint.getIdentity().equals(entryPointRIDToLoad)) {
    // ENTRY POINT IS CHANGED: SAVE THE NEW CONFIGURATION TO DISK
    save();
    entryPointRIDToLoad = entryPoint.getIdentity();
  }
}

Problem Flow

  1. Index Creation (lines 187-190):

    • When loading from a new .hnswidx file with "entryPoint": ""
    • entryPointRIDToLoad is set to null:
    if (!json.getString("entryPoint").isEmpty()) {
      this.entryPointRIDToLoad = new RID(database, json.getString("entryPoint"));
    } else
      this.entryPointRIDToLoad = null;  // ← Initial state for new index
  2. First Vertex Added (line 417):

    • The first vertex becomes the entry point:
    if (entryPoint == null || vertexMaxLevel > entryPointCopyMaxLevel)
      this.entryPoint = vertex;  // ← entryPoint is now set
  3. Transaction Commit:

    • onAfterCommit() is called
    • The condition evaluates: entryPoint != null && !entryPoint.getIdentity().equals(entryPointRIDToLoad)
    • Since entryPointRIDToLoad is null, .equals(null) returns false
    • The full condition becomes true && !false = true
    • However, the condition is designed to detect changes, not initial creation
    • The logic doesn't properly handle the null → RID transition
  4. Result:

    • The .hnswidx file retains "entryPoint": ""
    • The entry point exists in memory but not on disk

Impact

Observed Behavior

  • Works: Vector searches function correctly in the same session where the index is created
  • Fails: On database reopen, the empty entryPoint may cause:
    1. Index to be ignored/unusable (requires rebuild)
    2. Index to be dropped automatically (see line 217-219 in onAfterSchemaLoad())

Code Reference (lines 213-221)

@Override
public void onAfterSchemaLoad() {
  // ...
  if (this.entryPointRIDToLoad != null) {
    try {
      this.entryPoint = this.entryPointRIDToLoad.asVertex();
    } catch (RecordNotFoundException e) {
      // ENTRYPOINT DELETED, DROP THE INDEX
      LogManager.instance()
          .log(this, Level.WARNING, "HNSW index '" + indexName + "' has an invalid entrypoint. The index will be removed");
      this.entryPointRIDToLoad = null;
      database.getSchema().dropIndex(indexName);
    }
  }
  // ...
}

Proposed Fix

Change the condition in onAfterCommit() to handle the initial null case:

@Override
public void onAfterCommit() {
  // OLD (buggy):
  // if (entryPoint != null && !entryPoint.getIdentity().equals(entryPointRIDToLoad)) {
  
  // NEW (fixed):
  if (entryPoint != null && 
      (entryPointRIDToLoad == null || !entryPoint.getIdentity().equals(entryPointRIDToLoad))) {
    // ENTRY POINT IS CHANGED: SAVE THE NEW CONFIGURATION TO DISK
    save();
    entryPointRIDToLoad = entryPoint.getIdentity();
  }
}

Alternative Fix

Add an explicit save at the end of the first transaction:

@Override
public void onAfterCommit() {
  if (entryPoint != null) {
    if (entryPointRIDToLoad == null || !entryPoint.getIdentity().equals(entryPointRIDToLoad)) {
      // ENTRY POINT IS CHANGED OR FIRST TIME SET: SAVE THE NEW CONFIGURATION TO DISK
      save();
      entryPointRIDToLoad = entryPoint.getIdentity();
    }
  }
}

Verification Steps

After applying the fix:

  1. Create a new HNSW index
  2. Add vertices in a transaction
  3. Commit and close the database
  4. Check the .hnswidx file: "entryPoint" should contain a RID
  5. Reopen the database
  6. Verify vector searches still work

Additional Context

  • This bug was discovered while building a Stack Overflow dataset importer
  • The index contained 4,825 Questions with 384-dimensional embeddings
  • Vector searches worked perfectly during creation session
  • Only discovered when checking persistence via .hnswidx file inspection

Files Involved

  • engine/src/main/java/com/arcadedb/index/vector/HnswVectorIndex.java
    • Line 229: onAfterCommit() - where the bug exists
    • Line 417: add() - where entry point is first set
    • Line 187-190: Constructor loading from JSON
    • Line 820: toJSON() - where entry point is serialized

Metadata

Metadata

Assignees

Labels

invalidThis doesn't seem rightwontfixThis will not be worked on

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions