Skip to content

Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation#2820

Merged
lvca merged 3 commits intojvector-integrationfrom
copilot/sub-pr-2816-another-one
Nov 22, 2025
Merged

Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation#2820
lvca merged 3 commits intojvector-integrationfrom
copilot/sub-pr-2816-another-one

Conversation

Copy link
Contributor

Copilot AI commented Nov 21, 2025

The vectorNeighbors() SQL function was re-fetching vector properties and recalculating distances after JVector had already computed similarity scores during search.

Changes

  • Added findNeighborsFromVector() to LSMVectorIndex that extracts scores from JVector's SearchResult.NodeScore and converts them to distances based on the similarity function
  • Updated SQLFunctionVectorNeighbors to use the new method, eliminating ~60 lines of redundant distance calculation code

Before

// Performed search, then re-extracted vectors and recalculated distances
final IndexCursor cursor = lsmIndex.get(new Object[] { queryVector }, limit);
while (cursor.hasNext()) {
  final Vertex vertex = cursor.next().asVertex();
  final float[] vertexVector = vertex.get(vectorProperty);
  final float distance = calculateDistance(queryVector, vertexVector, similarityFunction);
  // ...
}

After

// Directly uses scores from JVector search results
final List<Pair<RID, Float>> neighbors = lsmIndex.findNeighborsFromVector(queryVector, limit);
for (final Pair<RID, Float> neighbor : neighbors) {
  final float distance = neighbor.getSecond();  // Score already computed
  // ...
}

This aligns LSMVectorIndex with HnswVectorIndex which already exposed scores via findNeighborsFromVector().


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits November 21, 2025 22:24
… directly

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 21, 2025

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • repository.apache.org
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -classpath /home/REDACTED/.m2/wrapper/dists/apache-maven-3.9.9/3477a4f1/boot/plexus-classworlds-2.8.0.jar -Dclassworlds.conf=/home/REDACTED/.m2/wrapper/dists/apache-maven-3.9.9/3477a4f1/bin/m2.conf -Dmaven.home=/home/REDACTED/.m2/wrapper/dists/apache-maven-3.9.9/3477a4f1 -Dlibrary.jansi.path=/home/REDACTED/.m2/wrapper/dists/apache-maven-3.9.9/3477a4f1/lib/jansi-native -Dmaven.multiModuleProjectDirectory=/home/REDACTED/work/arcadedb/arcadedb org.codehaus.plexus.classworlds.launcher.Launcher -f pom.xml -B -V -e -Dfindbugs.skip -Dcheckstyle.skip -Dpmd.skip=true -Dspotbugs.skip -Denforcer.skip -Dmaven.javadoc.skip -DskipTests (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title [WIP] Address feedback on JVector integration PR Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation Nov 21, 2025
Copilot AI requested a review from lvca November 21, 2025 22:30
@lvca lvca marked this pull request as ready for review November 22, 2025 14:34
@lvca lvca merged commit 2c8cd00 into jvector-integration Nov 22, 2025
2 of 3 checks passed
@mergify
Copy link
Contributor

mergify bot commented Nov 22, 2025

🧪 CI Insights

Here's what we observed from your CI run for 6ac688c.

🟢 All jobs passed!

But CI Insights is watching 👀

@lvca lvca added this to the 25.11.1 milestone Nov 22, 2025
@lvca lvca added the enhancement New feature or request label Nov 22, 2025
robfrank pushed a commit that referenced this pull request Nov 22, 2025
…nce recalculation (#2820)

* Initial plan

* Add findNeighborsFromVector method to LSMVectorIndex to return scores directly

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test artifacts and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
lvca added a commit that referenced this pull request Nov 22, 2025
* First version with jvector

* Implemented compaction of vector indexes

* Added test cases

* Fixed compilation problems

* Fixed test cases, now all pass

* Refactor vector index using the transaction index changes instead of internal map (with threadId)

* feat: integrated new vector index with the `database import` command

* Supported lsmvector in `vectorNeighbors()` sql function

* Upgraded to jvector 4.0.0-rc.6

* Update LSMVectorIndexCompacted.java

fix: error after compaction

* Fix ComparableVector Comparable contract violation (#2817)

* Initial plan

* Fix ComparableVector to maintain Comparable contract

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation (#2820)

* Initial plan

* Add findNeighborsFromVector method to LSMVectorIndex to return scores directly

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test artifacts and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add mutable flag to vector index pages for safe compaction (#2819)

* Initial plan

* Add mutable byte indicator to vector index pages

- Added mutable flag byte at offset 8 in page header (after offsetFreeContent and numberOfEntries)
- New pages are created with mutable=1 (actively being written to)
- Pages are marked as immutable (mutable=0) when they become full and a new page is created
- Updated findLastImmutablePage() to scan from end backwards and stop at first immutable page
- Updated all page reading/writing code to account for the mutable byte in header
- All vector index tests passing

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add constants for page header offsets to improve maintainability

- Added OFFSET_FREE_CONTENT, OFFSET_NUM_ENTRIES, OFFSET_MUTABLE, and HEADER_BASE_SIZE constants
- Replaced magic numbers throughout the code with named constants
- Makes the code more maintainable and self-documenting

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Update comments to reference constants instead of hardcoded offsets

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* Make LSMVectorIndex ID property configurable (#2818)

* Initial plan

* Make ID property configurable in LSMVectorIndex

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and add to .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Improve documentation for metadata JSON configuration

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* First version with jvector

* Implemented compaction of vector indexes

* Added test cases

* Fixed compilation problems

* Fixed test cases, now all pass

* Refactor vector index using the transaction index changes instead of internal map (with threadId)

* feat: integrated new vector index with the `database import` command

* Supported lsmvector in `vectorNeighbors()` sql function

* Upgraded to jvector 4.0.0-rc.6

* Update LSMVectorIndexCompacted.java

fix: error after compaction

* Fix ComparableVector Comparable contract violation (#2817)

* Initial plan

* Fix ComparableVector to maintain Comparable contract

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation (#2820)

* Initial plan

* Add findNeighborsFromVector method to LSMVectorIndex to return scores directly

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test artifacts and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add mutable flag to vector index pages for safe compaction (#2819)

* Initial plan

* Add mutable byte indicator to vector index pages

- Added mutable flag byte at offset 8 in page header (after offsetFreeContent and numberOfEntries)
- New pages are created with mutable=1 (actively being written to)
- Pages are marked as immutable (mutable=0) when they become full and a new page is created
- Updated findLastImmutablePage() to scan from end backwards and stop at first immutable page
- Updated all page reading/writing code to account for the mutable byte in header
- All vector index tests passing

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add constants for page header offsets to improve maintainability

- Added OFFSET_FREE_CONTENT, OFFSET_NUM_ENTRIES, OFFSET_MUTABLE, and HEADER_BASE_SIZE constants
- Replaced magic numbers throughout the code with named constants
- Makes the code more maintainable and self-documenting

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Update comments to reference constants instead of hardcoded offsets

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* Make LSMVectorIndex ID property configurable (#2818)

* Initial plan

* Make ID property configurable in LSMVectorIndex

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and add to .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Improve documentation for metadata JSON configuration

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* fix pre-commit

* Update engine/src/main/java/com/arcadedb/database/TransactionIndexContext.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fixed compaction

* test: fixed test

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Roberto Franchini <ro.franchini@gmail.com>
robfrank pushed a commit that referenced this pull request Feb 11, 2026
* First version with jvector

* Implemented compaction of vector indexes

* Added test cases

* Fixed compilation problems

* Fixed test cases, now all pass

* Refactor vector index using the transaction index changes instead of internal map (with threadId)

* feat: integrated new vector index with the `database import` command

* Supported lsmvector in `vectorNeighbors()` sql function

* Upgraded to jvector 4.0.0-rc.6

* Update LSMVectorIndexCompacted.java

fix: error after compaction

* Fix ComparableVector Comparable contract violation (#2817)

* Initial plan

* Fix ComparableVector to maintain Comparable contract

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation (#2820)

* Initial plan

* Add findNeighborsFromVector method to LSMVectorIndex to return scores directly

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test artifacts and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add mutable flag to vector index pages for safe compaction (#2819)

* Initial plan

* Add mutable byte indicator to vector index pages

- Added mutable flag byte at offset 8 in page header (after offsetFreeContent and numberOfEntries)
- New pages are created with mutable=1 (actively being written to)
- Pages are marked as immutable (mutable=0) when they become full and a new page is created
- Updated findLastImmutablePage() to scan from end backwards and stop at first immutable page
- Updated all page reading/writing code to account for the mutable byte in header
- All vector index tests passing

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add constants for page header offsets to improve maintainability

- Added OFFSET_FREE_CONTENT, OFFSET_NUM_ENTRIES, OFFSET_MUTABLE, and HEADER_BASE_SIZE constants
- Replaced magic numbers throughout the code with named constants
- Makes the code more maintainable and self-documenting

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Update comments to reference constants instead of hardcoded offsets

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* Make LSMVectorIndex ID property configurable (#2818)

* Initial plan

* Make ID property configurable in LSMVectorIndex

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and add to .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Improve documentation for metadata JSON configuration

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* First version with jvector

* Implemented compaction of vector indexes

* Added test cases

* Fixed compilation problems

* Fixed test cases, now all pass

* Refactor vector index using the transaction index changes instead of internal map (with threadId)

* feat: integrated new vector index with the `database import` command

* Supported lsmvector in `vectorNeighbors()` sql function

* Upgraded to jvector 4.0.0-rc.6

* Update LSMVectorIndexCompacted.java

fix: error after compaction

* Fix ComparableVector Comparable contract violation (#2817)

* Initial plan

* Fix ComparableVector to maintain Comparable contract

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation (#2820)

* Initial plan

* Add findNeighborsFromVector method to LSMVectorIndex to return scores directly

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test artifacts and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add mutable flag to vector index pages for safe compaction (#2819)

* Initial plan

* Add mutable byte indicator to vector index pages

- Added mutable flag byte at offset 8 in page header (after offsetFreeContent and numberOfEntries)
- New pages are created with mutable=1 (actively being written to)
- Pages are marked as immutable (mutable=0) when they become full and a new page is created
- Updated findLastImmutablePage() to scan from end backwards and stop at first immutable page
- Updated all page reading/writing code to account for the mutable byte in header
- All vector index tests passing

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add constants for page header offsets to improve maintainability

- Added OFFSET_FREE_CONTENT, OFFSET_NUM_ENTRIES, OFFSET_MUTABLE, and HEADER_BASE_SIZE constants
- Replaced magic numbers throughout the code with named constants
- Makes the code more maintainable and self-documenting

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Update comments to reference constants instead of hardcoded offsets

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* Make LSMVectorIndex ID property configurable (#2818)

* Initial plan

* Make ID property configurable in LSMVectorIndex

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and add to .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Improve documentation for metadata JSON configuration

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* fix pre-commit

* Update engine/src/main/java/com/arcadedb/database/TransactionIndexContext.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fixed compaction

* test: fixed test

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Roberto Franchini <ro.franchini@gmail.com>

(cherry picked from commit c470e6d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants