Skip to content

Add mutable flag to vector index pages for safe compaction#2819

Merged
lvca merged 6 commits intojvector-integrationfrom
copilot/sub-pr-2816-again
Nov 22, 2025
Merged

Add mutable flag to vector index pages for safe compaction#2819
lvca merged 6 commits intojvector-integrationfrom
copilot/sub-pr-2816-again

Conversation

Copy link
Contributor

Copilot AI commented Nov 21, 2025

What does this PR do?

Adds a byte indicator at offset 8 in vector index page headers to distinguish mutable (actively written) from immutable (sealed) pages. Updates the compaction algorithm to scan backwards from the end and stop at the first immutable page, preventing race conditions from compacting partially-written pages.

Motivation

Addresses review feedback on #2816 where the original findLastImmutablePage() implementation would select all existing pages for compaction, creating a race condition if concurrent transactions were writing to the last page.

Related issues

Additional Notes

Page Header Format:

[offsetFreeContent(4)][numberOfEntries(4)][mutable(1)][pointers...]...[entries]

Implementation Details:

  • New pages created with mutable=1
  • Pages marked mutable=0 when full (new page allocated)
  • Added constants: OFFSET_MUTABLE, HEADER_BASE_SIZE to eliminate magic numbers
  • Compaction now scans pages backwards, stops at first mutable=0 page

Backward Compatibility: Existing pages without the mutable flag will read as 0 (immutable) from uninitialized memory, which is safe since they're already sealed.

Checklist

  • I have run the build using mvn clean package command
  • My unit tests cover both failure and success scenarios

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 4 commits November 21, 2025 22:25
- Added mutable flag byte at offset 8 in page header (after offsetFreeContent and numberOfEntries)
- New pages are created with mutable=1 (actively being written to)
- Pages are marked as immutable (mutable=0) when they become full and a new page is created
- Updated findLastImmutablePage() to scan from end backwards and stop at first immutable page
- Updated all page reading/writing code to account for the mutable byte in header
- All vector index tests passing

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
- Added OFFSET_FREE_CONTENT, OFFSET_NUM_ENTRIES, OFFSET_MUTABLE, and HEADER_BASE_SIZE constants
- Replaced magic numbers throughout the code with named constants
- Makes the code more maintainable and self-documenting

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 21, 2025

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • repository.apache.org
    • Triggering command: /usr/lib/jvm/temurin-21-jdk-amd64/bin/java /usr/lib/jvm/temurin-21-jdk-amd64/bin/java --enable-native-access=ALL-UNNAMED -classpath /usr/share/apache-maven-3.9.11/boot/plexus-classworlds-2.9.0.jar -Dclassworlds.conf=/usr/share/apache-maven-3.9.11/bin/m2.conf -Dmaven.home=/usr/share/apache-maven-3.9.11 -Dlibrary.jansi.path=/usr/share/apache-maven-3.9.11/lib/jansi-native -Dmaven.multiModuleProjectDirectory=/home/REDACTED/work/arcadedb/arcadedb org.codehaus.plexus.classworlds.launcher.Launcher clean compile -DskipTests (dns block)
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -classpath /home/REDACTED/.m2/wrapper/dists/apache-maven-3.9.9/3477a4f1/boot/plexus-classworlds-2.8.0.jar -Dclassworlds.conf=/home/REDACTED/.m2/wrapper/dists/apache-maven-3.9.9/3477a4f1/bin/m2.conf -Dmaven.home=/home/REDACTED/.m2/wrapper/dists/apache-maven-3.9.9/3477a4f1 -Dlibrary.jansi.path=/home/REDACTED/.m2/wrapper/dists/apache-maven-3.9.9/3477a4f1/lib/jansi-native -Dmaven.multiModuleProjectDirectory=/home/REDACTED/work/arcadedb/arcadedb org.codehaus.plexus.classworlds.launcher.Launcher -f pom.xml -B -V -e -Dfindbugs.skip -Dcheckstyle.skip -Dpmd.skip=true -Dspotbugs.skip -Denforcer.skip -Dmaven.javadoc.skip -DskipTests (dns block)
  • scarf.sh
    • Triggering command: /home/REDACTED/work/arcadedb/arcadedb/studio/target/node/node node ./report.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title [WIP] Address feedback on byte indicator for mutable pages Add mutable flag to vector index pages for safe compaction Nov 21, 2025
Copilot AI requested a review from lvca November 21, 2025 22:35
@lvca lvca marked this pull request as ready for review November 22, 2025 15:03
@lvca lvca merged commit 47ae8fd into jvector-integration Nov 22, 2025
0 of 3 checks passed
@lvca lvca added this to the 25.11.1 milestone Nov 22, 2025
@lvca lvca added the enhancement New feature or request label Nov 22, 2025
@mergify
Copy link
Contributor

mergify bot commented Nov 22, 2025

🧪 CI Insights

Here's what we observed from your CI run for 1dbb9ec.

🟢 All jobs passed!

But CI Insights is watching 👀

robfrank pushed a commit that referenced this pull request Nov 22, 2025
* Initial plan

* Add mutable byte indicator to vector index pages

- Added mutable flag byte at offset 8 in page header (after offsetFreeContent and numberOfEntries)
- New pages are created with mutable=1 (actively being written to)
- Pages are marked as immutable (mutable=0) when they become full and a new page is created
- Updated findLastImmutablePage() to scan from end backwards and stop at first immutable page
- Updated all page reading/writing code to account for the mutable byte in header
- All vector index tests passing

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add constants for page header offsets to improve maintainability

- Added OFFSET_FREE_CONTENT, OFFSET_NUM_ENTRIES, OFFSET_MUTABLE, and HEADER_BASE_SIZE constants
- Replaced magic numbers throughout the code with named constants
- Makes the code more maintainable and self-documenting

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Update comments to reference constants instead of hardcoded offsets

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>
lvca added a commit that referenced this pull request Nov 22, 2025
* First version with jvector

* Implemented compaction of vector indexes

* Added test cases

* Fixed compilation problems

* Fixed test cases, now all pass

* Refactor vector index using the transaction index changes instead of internal map (with threadId)

* feat: integrated new vector index with the `database import` command

* Supported lsmvector in `vectorNeighbors()` sql function

* Upgraded to jvector 4.0.0-rc.6

* Update LSMVectorIndexCompacted.java

fix: error after compaction

* Fix ComparableVector Comparable contract violation (#2817)

* Initial plan

* Fix ComparableVector to maintain Comparable contract

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation (#2820)

* Initial plan

* Add findNeighborsFromVector method to LSMVectorIndex to return scores directly

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test artifacts and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add mutable flag to vector index pages for safe compaction (#2819)

* Initial plan

* Add mutable byte indicator to vector index pages

- Added mutable flag byte at offset 8 in page header (after offsetFreeContent and numberOfEntries)
- New pages are created with mutable=1 (actively being written to)
- Pages are marked as immutable (mutable=0) when they become full and a new page is created
- Updated findLastImmutablePage() to scan from end backwards and stop at first immutable page
- Updated all page reading/writing code to account for the mutable byte in header
- All vector index tests passing

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add constants for page header offsets to improve maintainability

- Added OFFSET_FREE_CONTENT, OFFSET_NUM_ENTRIES, OFFSET_MUTABLE, and HEADER_BASE_SIZE constants
- Replaced magic numbers throughout the code with named constants
- Makes the code more maintainable and self-documenting

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Update comments to reference constants instead of hardcoded offsets

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* Make LSMVectorIndex ID property configurable (#2818)

* Initial plan

* Make ID property configurable in LSMVectorIndex

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and add to .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Improve documentation for metadata JSON configuration

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* First version with jvector

* Implemented compaction of vector indexes

* Added test cases

* Fixed compilation problems

* Fixed test cases, now all pass

* Refactor vector index using the transaction index changes instead of internal map (with threadId)

* feat: integrated new vector index with the `database import` command

* Supported lsmvector in `vectorNeighbors()` sql function

* Upgraded to jvector 4.0.0-rc.6

* Update LSMVectorIndexCompacted.java

fix: error after compaction

* Fix ComparableVector Comparable contract violation (#2817)

* Initial plan

* Fix ComparableVector to maintain Comparable contract

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation (#2820)

* Initial plan

* Add findNeighborsFromVector method to LSMVectorIndex to return scores directly

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test artifacts and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add mutable flag to vector index pages for safe compaction (#2819)

* Initial plan

* Add mutable byte indicator to vector index pages

- Added mutable flag byte at offset 8 in page header (after offsetFreeContent and numberOfEntries)
- New pages are created with mutable=1 (actively being written to)
- Pages are marked as immutable (mutable=0) when they become full and a new page is created
- Updated findLastImmutablePage() to scan from end backwards and stop at first immutable page
- Updated all page reading/writing code to account for the mutable byte in header
- All vector index tests passing

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add constants for page header offsets to improve maintainability

- Added OFFSET_FREE_CONTENT, OFFSET_NUM_ENTRIES, OFFSET_MUTABLE, and HEADER_BASE_SIZE constants
- Replaced magic numbers throughout the code with named constants
- Makes the code more maintainable and self-documenting

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Update comments to reference constants instead of hardcoded offsets

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* Make LSMVectorIndex ID property configurable (#2818)

* Initial plan

* Make ID property configurable in LSMVectorIndex

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and add to .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Improve documentation for metadata JSON configuration

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* fix pre-commit

* Update engine/src/main/java/com/arcadedb/database/TransactionIndexContext.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fixed compaction

* test: fixed test

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Roberto Franchini <ro.franchini@gmail.com>
robfrank pushed a commit that referenced this pull request Feb 11, 2026
* First version with jvector

* Implemented compaction of vector indexes

* Added test cases

* Fixed compilation problems

* Fixed test cases, now all pass

* Refactor vector index using the transaction index changes instead of internal map (with threadId)

* feat: integrated new vector index with the `database import` command

* Supported lsmvector in `vectorNeighbors()` sql function

* Upgraded to jvector 4.0.0-rc.6

* Update LSMVectorIndexCompacted.java

fix: error after compaction

* Fix ComparableVector Comparable contract violation (#2817)

* Initial plan

* Fix ComparableVector to maintain Comparable contract

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation (#2820)

* Initial plan

* Add findNeighborsFromVector method to LSMVectorIndex to return scores directly

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test artifacts and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add mutable flag to vector index pages for safe compaction (#2819)

* Initial plan

* Add mutable byte indicator to vector index pages

- Added mutable flag byte at offset 8 in page header (after offsetFreeContent and numberOfEntries)
- New pages are created with mutable=1 (actively being written to)
- Pages are marked as immutable (mutable=0) when they become full and a new page is created
- Updated findLastImmutablePage() to scan from end backwards and stop at first immutable page
- Updated all page reading/writing code to account for the mutable byte in header
- All vector index tests passing

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add constants for page header offsets to improve maintainability

- Added OFFSET_FREE_CONTENT, OFFSET_NUM_ENTRIES, OFFSET_MUTABLE, and HEADER_BASE_SIZE constants
- Replaced magic numbers throughout the code with named constants
- Makes the code more maintainable and self-documenting

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Update comments to reference constants instead of hardcoded offsets

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* Make LSMVectorIndex ID property configurable (#2818)

* Initial plan

* Make ID property configurable in LSMVectorIndex

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and add to .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Improve documentation for metadata JSON configuration

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* First version with jvector

* Implemented compaction of vector indexes

* Added test cases

* Fixed compilation problems

* Fixed test cases, now all pass

* Refactor vector index using the transaction index changes instead of internal map (with threadId)

* feat: integrated new vector index with the `database import` command

* Supported lsmvector in `vectorNeighbors()` sql function

* Upgraded to jvector 4.0.0-rc.6

* Update LSMVectorIndexCompacted.java

fix: error after compaction

* Fix ComparableVector Comparable contract violation (#2817)

* Initial plan

* Fix ComparableVector to maintain Comparable contract

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Return similarity scores from LSMVectorIndex to avoid redundant distance recalculation (#2820)

* Initial plan

* Add findNeighborsFromVector method to LSMVectorIndex to return scores directly

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test artifacts and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add mutable flag to vector index pages for safe compaction (#2819)

* Initial plan

* Add mutable byte indicator to vector index pages

- Added mutable flag byte at offset 8 in page header (after offsetFreeContent and numberOfEntries)
- New pages are created with mutable=1 (actively being written to)
- Pages are marked as immutable (mutable=0) when they become full and a new page is created
- Updated findLastImmutablePage() to scan from end backwards and stop at first immutable page
- Updated all page reading/writing code to account for the mutable byte in header
- All vector index tests passing

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and update .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Add constants for page header offsets to improve maintainability

- Added OFFSET_FREE_CONTENT, OFFSET_NUM_ENTRIES, OFFSET_MUTABLE, and HEADER_BASE_SIZE constants
- Replaced magic numbers throughout the code with named constants
- Makes the code more maintainable and self-documenting

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Update comments to reference constants instead of hardcoded offsets

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* Make LSMVectorIndex ID property configurable (#2818)

* Initial plan

* Make ID property configurable in LSMVectorIndex

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Remove test database files and add to .gitignore

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

* Improve documentation for metadata JSON configuration

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>

* fix pre-commit

* Update engine/src/main/java/com/arcadedb/database/TransactionIndexContext.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fixed compaction

* test: fixed test

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Roberto Franchini <ro.franchini@gmail.com>

(cherry picked from commit c470e6d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants