-
-
Notifications
You must be signed in to change notification settings - Fork 94
Closed as not planned
Labels
invalidThis doesn't seem rightThis doesn't seem rightwontfixThis will not be worked onThis will not be worked on
Milestone
Description
Summary
When creating a new HNSW vector index and adding vertices in a single transaction, the entryPoint field is not persisted to the .hnswidx metadata file. The index works correctly during the creation session, but the empty entryPoint may cause issues when reopening the database.
Root Cause
The bug is in the onAfterCommit() hook in HnswVectorIndex.java (line 229):
@Override
public void onAfterCommit() {
if (entryPoint != null && !entryPoint.getIdentity().equals(entryPointRIDToLoad)) {
// ENTRY POINT IS CHANGED: SAVE THE NEW CONFIGURATION TO DISK
save();
entryPointRIDToLoad = entryPoint.getIdentity();
}
}Problem Flow
-
Index Creation (lines 187-190):
- When loading from a new
.hnswidxfile with"entryPoint": "" entryPointRIDToLoadis set tonull:
if (!json.getString("entryPoint").isEmpty()) { this.entryPointRIDToLoad = new RID(database, json.getString("entryPoint")); } else this.entryPointRIDToLoad = null; // ← Initial state for new index
- When loading from a new
-
First Vertex Added (line 417):
- The first vertex becomes the entry point:
if (entryPoint == null || vertexMaxLevel > entryPointCopyMaxLevel) this.entryPoint = vertex; // ← entryPoint is now set
-
Transaction Commit:
onAfterCommit()is called- The condition evaluates:
entryPoint != null && !entryPoint.getIdentity().equals(entryPointRIDToLoad) - Since
entryPointRIDToLoadisnull,.equals(null)returnsfalse - The full condition becomes
true && !false=true - However, the condition is designed to detect changes, not initial creation
- The logic doesn't properly handle the
null→ RID transition
-
Result:
- The
.hnswidxfile retains"entryPoint": "" - The entry point exists in memory but not on disk
- The
Impact
Observed Behavior
- ✅ Works: Vector searches function correctly in the same session where the index is created
- ❌ Fails: On database reopen, the empty
entryPointmay cause:- Index to be ignored/unusable (requires rebuild)
- Index to be dropped automatically (see line 217-219 in
onAfterSchemaLoad())
Code Reference (lines 213-221)
@Override
public void onAfterSchemaLoad() {
// ...
if (this.entryPointRIDToLoad != null) {
try {
this.entryPoint = this.entryPointRIDToLoad.asVertex();
} catch (RecordNotFoundException e) {
// ENTRYPOINT DELETED, DROP THE INDEX
LogManager.instance()
.log(this, Level.WARNING, "HNSW index '" + indexName + "' has an invalid entrypoint. The index will be removed");
this.entryPointRIDToLoad = null;
database.getSchema().dropIndex(indexName);
}
}
// ...
}Proposed Fix
Change the condition in onAfterCommit() to handle the initial null case:
@Override
public void onAfterCommit() {
// OLD (buggy):
// if (entryPoint != null && !entryPoint.getIdentity().equals(entryPointRIDToLoad)) {
// NEW (fixed):
if (entryPoint != null &&
(entryPointRIDToLoad == null || !entryPoint.getIdentity().equals(entryPointRIDToLoad))) {
// ENTRY POINT IS CHANGED: SAVE THE NEW CONFIGURATION TO DISK
save();
entryPointRIDToLoad = entryPoint.getIdentity();
}
}Alternative Fix
Add an explicit save at the end of the first transaction:
@Override
public void onAfterCommit() {
if (entryPoint != null) {
if (entryPointRIDToLoad == null || !entryPoint.getIdentity().equals(entryPointRIDToLoad)) {
// ENTRY POINT IS CHANGED OR FIRST TIME SET: SAVE THE NEW CONFIGURATION TO DISK
save();
entryPointRIDToLoad = entryPoint.getIdentity();
}
}
}Verification Steps
After applying the fix:
- Create a new HNSW index
- Add vertices in a transaction
- Commit and close the database
- Check the
.hnswidxfile:"entryPoint"should contain a RID - Reopen the database
- Verify vector searches still work
Additional Context
- This bug was discovered while building a Stack Overflow dataset importer
- The index contained 4,825 Questions with 384-dimensional embeddings
- Vector searches worked perfectly during creation session
- Only discovered when checking persistence via
.hnswidxfile inspection
Files Involved
engine/src/main/java/com/arcadedb/index/vector/HnswVectorIndex.java- Line 229:
onAfterCommit()- where the bug exists - Line 417:
add()- where entry point is first set - Line 187-190: Constructor loading from JSON
- Line 820:
toJSON()- where entry point is serialized
- Line 229:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
invalidThis doesn't seem rightThis doesn't seem rightwontfixThis will not be worked onThis will not be worked on