Lsm vector fix by lvca · Pull Request #2907 · ArcadeData/arcadedb

lvca · 2025-12-11T07:40:56Z

What does this PR do?

A brief description of the change being made with this pull request.

Motivation

What inspired you to submit this pull request?

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.

Additional Notes

Anything else we should know when reviewing?

Checklist

I have run the build using mvn clean package command
My unit tests cover both failure and success scenarios

- Added updatePageCount() call in createNewPage() to properly track page count - Follows same pattern as LSMTreeIndexCompacted - Prevents all pages from overwriting page 0 during compaction - Pages are now correctly numbered and getTotalPages() returns accurate count Note: Test still fails due to vector count mismatch after compaction. Investigation shows only 19,596 of 30,000 expected vectors are being loaded after compaction. Root cause appears to be related to how document updates trigger vector index changes and page flushing timing.

- Changed mergePages to start from page 0 instead of page 1 - Page 0 contains vector data, not just metadata - Updated findLastImmutablePage to scan down to page 0 - Fixed loadVectorsFromFile to use correct header size for compacted page 0 - Page 0 of compacted index has 25-byte header (base 9 + metadata 16) - Other pages have 9-byte header - This was causing ~10,400 vectors to be missing after compaction All LSMVectorIndexTest tests now pass (14/14).

gemini-code-assist · 2025-12-11T07:41:31Z

Summary of Changes

Hello @lvca, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a fundamental change to the LSM Vector Index by decoupling the storage of vector data from the index pages. Instead of duplicating vector arrays within the index, the index now primarily stores metadata (vector ID, record ID, and deletion status), with the actual vector data residing solely in the associated documents. This architectural shift aims to optimize index size, improve compaction efficiency by reducing the data processed, and ensure data consistency. The changes are supported by extensive debugging logs and a new comprehensive test case that validates the compaction process, including the correct handling and removal of duplicate entries.

Highlights

Vector Data Storage Refactoring: The core change involves decoupling vector data storage from index pages. Vectors are now exclusively stored within the associated documents, while index pages only retain metadata (vector ID, Record ID, and deletion status).
Compaction Process Optimization: The compaction mechanism has been optimized to merge only vector metadata, significantly reducing the amount of data processed during compaction. This leads to smaller index pages and potentially faster compaction cycles.
Enhanced Debugging and Observability: Extensive logging and print statements have been added throughout the vector index and compactor classes. These provide detailed insights into the internal workings, especially during page loading, vector persistence, and the compaction lifecycle, aiding in debugging and monitoring.
New Compaction Test Case: A comprehensive new test, manualCompactionRemovesDuplicates, has been introduced. This test rigorously validates the compaction logic, ensuring correct merging of pages, removal of duplicate entries, and maintenance of data integrity after compaction.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

mergify · 2025-12-11T07:42:29Z

🧪 CI Insights

Here's what we observed from your CI run for 52b847f.

🟢 All jobs passed!

But CI Insights is watching 👀

gemini-code-assist

Code Review

This pull request introduces a significant and important fix to the LSM vector index by changing how vectors are stored. Instead of being embedded in the index pages, they are now correctly fetched from the documents, which should improve performance and reduce memory usage. The changes also include fixes for the compaction logic and a new comprehensive test for it. My main concerns are the introduction of System.out.println statements for debugging, which should be replaced with the standard logger, and a hardcoded file path in one of the tests.

engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndex.java

engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndexCompactor.java

integration/src/test/java/com/arcadedb/integration/importer/vector/GloVeTest.java

…java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…ompactor.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Copilot

Pull request overview

This PR implements a fix for LSM (Log-Structured Merge) vector index handling, fundamentally changing how vector data is stored and compacted. The key architectural change is that vectors are now stored in documents rather than in index pages, with index pages only containing metadata (id, position, bucketId, and deleted flag).

Key Changes:

Modified vector storage architecture: vectors now stored in documents, index pages store only metadata (17 bytes per entry instead of metadata + vector data)
Updated compaction logic to start from page 0 instead of page 1, treating it as containing vector data rather than just metadata
Added comprehensive test for manual compaction with duplicate removal

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 19 comments.

Show a summary per file

File	Description
GloVeTest.java	Changed hard-coded file path from one user's directory to another
LSMVectorIndexTest.java	Added new test `manualCompactionRemovesDuplicates()` to verify compaction correctly merges pages and removes duplicates
LSMVectorIndexCompactor.java	Updated compaction logic to handle new storage format: reads only metadata from pages, removed vector reading/writing, changed page scanning to include page 0, added extensive debug logging
LSMVectorIndexCompacted.java	Modified entry size calculations and page operations to handle metadata-only storage, added vector loading from documents in `getAllVectors()`
LSMVectorIndex.java	Updated persistence methods to write only metadata, changed logging from LogManager to System.out.println, added compacted sub-index assignment, modified page header size handling for compacted pages

Critical Issues Found:

Multiple System.out.println debug statements left in production code (should use LogManager)
Hard-coded user-specific file path in test
Commented-out validation code that should be removed
Potential bug with page 0 handling in compaction (needs verification if page 0 contains data or metadata)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-11T07:49:47Z

engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndexCompactor.java

+        pageBuffer.position(LSMVectorIndex.OFFSET_MUTABLE);
+        final byte mutable = pageBuffer.get();
+
+        System.out.println("  Page " + pageNum + ": numberOfEntries=" + numberOfEntries + ", mutable=" + mutable);


Debug System.out.println statements should not be committed to production code. Consider using the LogManager.instance().log() method that's already used elsewhere in this file, or remove these debug statements before merging.

Copilot · 2025-12-11T07:49:47Z

engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndexCompactor.java


-    LogManager.instance()
-        .log(mainIndex, Level.FINE, "Wrote %d non-deleted entries to compacted index", null, entriesWritten);
+    System.out.println("Wrote " + entriesWritten + " non-deleted entries to compacted index (skipped " + deletedSkipped + " deleted entries)");


Debug System.out.println statements should not be committed to production code. Consider using the LogManager.instance().log() method that's already used elsewhere in this file, or remove these debug statements before merging.

Suggested change

System.out.println("Wrote " + entriesWritten + " non-deleted entries to compacted index (skipped " + deletedSkipped + " deleted entries)");

LogManager.instance().log(Level.INFO, this, "Wrote %d non-deleted entries to compacted index (skipped %d deleted entries)", entriesWritten, deletedSkipped);

Copilot · 2025-12-11T07:49:48Z

engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndex.java

+      System.out.println("loadVectorsFromPages DONE: Loaded " + vectorIndex.size() + " vector locations (" + entriesRead + " total entries) for index: " + indexName
              + ", nextId=" + nextId.get() + ", fileId=" + getFileId() + ", totalPages=" + getTotalPages() +
-              (compactedSubIndex != null ? ", compactedFileId=" + compactedSubIndex.getFileId() + ", compactedPages=" + compactedSubIndex.getTotalPages() : ""));
+              (compactedSubIndex != null ?
+                  ", compactedFileId=" + compactedSubIndex.getFileId() + ", compactedPages=" + compactedSubIndex.getTotalPages() :
+                  ""));


Debug System.out.println statements should not be committed to production code. The original code used LogManager.instance().log() which is the appropriate logging mechanism. Please revert to using the logging framework instead of System.out.println.

Copilot · 2025-12-11T07:49:48Z

engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndex.java

-    LogManager.instance().log(this, Level.INFO,
-        "loadVectorsFromFile: fileId=%d, totalPages=%d, isCompacted=%s",
-        fileId, totalPages, isCompacted);
+    System.out.println("loadVectorsFromFile: fileId=" + fileId + ", totalPages=" + totalPages + ", isCompacted=" + isCompacted);


Debug System.out.println statements should not be committed to production code. The original code used LogManager.instance().log() which is the appropriate logging mechanism. Please revert to using the logging framework instead of System.out.println.

Suggested change

System.out.println("loadVectorsFromFile: fileId=" + fileId + ", totalPages=" + totalPages + ", isCompacted=" + isCompacted);

LogManager.instance().log(this, Level.FINE, "loadVectorsFromFile: fileId=%d, totalPages=%d, isCompacted=%b", null, fileId, totalPages, isCompacted);

Copilot · 2025-12-11T07:49:48Z

integration/src/test/java/com/arcadedb/integration/importer/vector/GloVeTest.java

 public class GloVeTest {
  private final static int     PARALLEL_LEVEL = 8;
-  private static final String  FILE_NAME      = "/Users/frank/Downloads/glove.twitter.27B/glove.twitter.27B.100d.txt";
+  private static final String  FILE_NAME      = "/Users/luca/Downloads/glove.twitter.27B.100d.txt";


This hard-coded file path should not be committed to version control. Test files should use relative paths, test resources, or configurable paths (e.g., from system properties or environment variables) to avoid making tests dependent on a specific user's local file system.

Suggested change

private static final String FILE_NAME = "/Users/luca/Downloads/glove.twitter.27B.100d.txt";

private static final String FILE_NAME =

System.getProperty("glove.file.path",

System.getenv("GLOVE_FILE_PATH") != null

? System.getenv("GLOVE_FILE_PATH")

: "src/test/resources/glove.twitter.27B.100d.txt");

Copilot · 2025-12-11T07:49:51Z

engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndexCompactor.java

+    System.out.println("Compaction RAM budget: " + (ramBudget / 1024 / 1024) + " MB, " +
+        "totalPagesToCompact: " + totalPagesToCompact + ", " +
+        "totalRAMNeeded: " + (totalRAMNeeded / 1024 / 1024) + " MB, " +
+        "pageSize: " + pageSize);
+
    if (totalRAMNeeded > ramBudget) {
      pagesToCompact = (int) (ramBudget / pageSize);
+      System.out.println("WARNING: RAM budget insufficient - compacting only " + pagesToCompact + " of " + totalPagesToCompact + " pages");
      if (pagesToCompact < 1)
        return 0; // Not enough RAM even for 1 page
    } else {
-      pagesToCompact = endPage - startPage + 1;
+      pagesToCompact = totalPagesToCompact;
+      System.out.println("RAM budget sufficient - compacting all " + pagesToCompact + " pages");


Debug System.out.println statements should not be committed to production code. Consider using the LogManager.instance().log() method that's already used elsewhere in this file, or remove these debug statements before merging.

Copilot · 2025-12-11T07:49:51Z

engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndexCompactor.java

+                System.out.println("    Entry " + i + " (page " + pageNum + "): id=" + id + ", rid=" + rid + ", deleted=" + deleted);
+              }
+            } else {
+              System.out.println("    Entry " + i + " (page " + pageNum + "): id=" + id + ", rid=" + rid + ", deleted=" + deleted);


Debug System.out.println statements should not be committed to production code. Consider using the LogManager.instance().log() method that's already used elsewhere in this file, or remove these debug statements before merging.

Suggested change

System.out.println(" Entry " + i + " (page " + pageNum + "): id=" + id + ", rid=" + rid + ", deleted=" + deleted);

}

} else {

System.out.println(" Entry " + i + " (page " + pageNum + "): id=" + id + ", rid=" + rid + ", deleted=" + deleted);

LogManager.instance().log(mainIndex, Level.FINE,

" Entry %d (page %d): id=%d, rid=%s, deleted=%s", i, pageNum, id, rid, deleted);

}

} else {

LogManager.instance().log(mainIndex, Level.FINE,

" Entry %d (page %d): id=%d, rid=%s, deleted=%s", i, pageNum, id, rid, deleted);

Copilot · 2025-12-11T07:49:51Z

engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndexCompactor.java

-    LogManager.instance()
-        .log(mainIndex, Level.FINE, "Read %d entries from %d pages, unique vectors: %d", null, totalEntriesRead,
-            pagesToCompact, vectorMap.size());
+    System.out.println("Read " + totalEntriesRead + " entries from " + pagesToCompact + " pages, unique vectors: " + vectorMap.size());


Debug System.out.println statements should not be committed to production code. Consider using the LogManager.instance().log() method that's already used elsewhere in this file, or remove these debug statements before merging.

Copilot · 2025-12-11T07:49:51Z

engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndexCompactor.java


      // Find last immutable page (skip mutable pages still being written)
      int lastImmutablePage = findLastImmutablePage(mainIndex, totalPages);
+      LogManager.instance().log(mainIndex, Level.INFO, "findLastImmutablePage returned: %d (totalPages=%d)", null, lastImmutablePage, totalPages);


Debug System.out.println statements should not be committed to production code. Consider using the LogManager.instance().log() method that's already used elsewhere in this file, or remove these debug statements before merging.

Copilot · 2025-12-11T07:49:52Z

engine/src/test/java/com/arcadedb/index/vector/LSMVectorIndexTest.java

+        assertThat(vec.length).as("Vector should have 4 dimensions").isEqualTo(4);
+
+        // First numUpdates docs were updated 5 times, rest kept original values
+        if (i < numUpdates) {


Test is always true, because of this condition.

codacy-production · 2025-12-11T08:04:50Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
Report missing for `a54b250`¹	✅ 73.58%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`a54b250`)	Report Missing	Report Missing	Report Missing
Head commit (`80eb23f`)	75790	48474	63.96%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#2907)	106	78	73.58%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct. ↩

….1 [skip ci] Bumps [com.arcadedb:arcadedb-network](https://github.com/ArcadeData/arcadedb) from 25.11.1 to 25.12.1. Release notes *Sourced from [com.arcadedb:arcadedb-network's releases](https://github.com/ArcadeData/arcadedb/releases).* > 25.12.1 > ------- > > ArcadeDB 25.12.1 Release Notes > ============================== > > We're excited to announce the release of ArcadeDB v25.12.1! This release includes significant bug fixes, new features, performance improvements, and dependency updates. > > Highlights > ---------- > > ### Vector Search Enhancements > > * **Fixed critical vector quantization bug** ([#3052](https://redirect.github.com/ArcadeData/arcadedb/issues/3052), [#3053](https://redirect.github.com/ArcadeData/arcadedb/issues/3053)) - INT8 and BINARY vector quantization now works correctly across all dimensions > * **New filtered vector search** ([#3071](https://redirect.github.com/ArcadeData/arcadedb/issues/3071), [#3072](https://redirect.github.com/ArcadeData/arcadedb/issues/3072)) - LSMVectorIndex now supports filtered searches for more precise queries > * **Better vector type support** ([#3090](https://redirect.github.com/ArcadeData/arcadedb/issues/3090)) - Added support for `List<Float>` in vector indexes > * **Improved compression** ([#2911](https://redirect.github.com/ArcadeData/arcadedb/issues/2911)) - Enhanced compression for LSM vector indexes > * **Fixed HNSW graph persistence** ([#2916](https://redirect.github.com/ArcadeData/arcadedb/issues/2916)) - Ensures JVector HNSW graph file is properly closed and flushed to disk > > ### SQL and Query Improvements > > * **Fixed IF statement execution** ([#2775](https://redirect.github.com/ArcadeData/arcadedb/issues/2775)) - SQL scripts with IF statements now execute correctly from console > * **Fixed index creation with IF NOT EXISTS** ([#1819](https://redirect.github.com/ArcadeData/arcadedb/issues/1819)) - Console no longer errors when creating existing indexes with IF NOT EXISTS clause > * **Custom function parameter binding** ([#3046](https://redirect.github.com/ArcadeData/arcadedb/issues/3046), [#3049](https://redirect.github.com/ArcadeData/arcadedb/issues/3049)) - Fixed parameter binding for SQL and JavaScript custom functions > * **SQL method consistency** ([#2964](https://redirect.github.com/ArcadeData/arcadedb/issues/2964), [#2967](https://redirect.github.com/ArcadeData/arcadedb/issues/2967)) - `values()` method now behaves consistently with `keys()` method > * **CONTAINSANY index fix** ([#3051](https://redirect.github.com/ArcadeData/arcadedb/issues/3051)) - Fixed index usage for lists of embedded documents with CONTAINSANY > > ### Transaction Management > > * **Revised transaction logic** ([#3074](https://redirect.github.com/ArcadeData/arcadedb/issues/3074)) - Improved transaction handling and consistency > * **Fixed edge index invalidation** ([#3091](https://redirect.github.com/ArcadeData/arcadedb/issues/3091)) - Edge indexes now remain valid in edge-case scenarios > > ### New Features > > * **Database size API** ([#3045](https://redirect.github.com/ArcadeData/arcadedb/issues/3045)) - Added new `database.getSize()` API method > * **Version display enhancement** ([#2905](https://redirect.github.com/ArcadeData/arcadedb/issues/2905)) - Server log version number now displayed consistently > > What's Changed > -------------- > > ### Bug Fixes > > * Fix INT8 and BINARY vector quantization offset bug in LSMVectorIndex page loading by [`@Copilot`](https://github.com/Copilot) in [ArcadeData/arcadedb#3053](https://redirect.github.com/ArcadeData/arcadedb/pull/3053) > * fix: revert SQL grammar changes and disable deep level JSON insert tests by [`@robfrank`](https://github.com/robfrank) in [ArcadeData/arcadedb#2961](https://redirect.github.com/ArcadeData/arcadedb/pull/2961) > * [#2915](https://redirect.github.com/ArcadeData/arcadedb/issues/2915) fix: ensure Jvector HNSW graph file is closed and flushed to disk on database close by [`@robfrank`](https://github.com/robfrank) in [ArcadeData/arcadedb#2916](https://redirect.github.com/ArcadeData/arcadedb/pull/2916) > * fix: make values method behave like keys method by [`@gramian`](https://github.com/gramian) in [ArcadeData/arcadedb#2967](https://redirect.github.com/ArcadeData/arcadedb/pull/2967) > * Fix custom function parameter binding for SQL and JavaScript functions by [`@Copilot`](https://github.com/Copilot) in [ArcadeData/arcadedb#3049](https://redirect.github.com/ArcadeData/arcadedb/pull/3049) > * fix CONTAINSANY index use for lists of embedded documents by [`@gramian`](https://github.com/gramian) in [ArcadeData/arcadedb#3051](https://redirect.github.com/ArcadeData/arcadedb/pull/3051) > * fix: support List in vector index by [`@szekelyszabi`](https://github.com/szekelyszabi) in [ArcadeData/arcadedb#3090](https://redirect.github.com/ArcadeData/arcadedb/pull/3090) > > ### Features > > * Show version number same as in server log by [`@gramian`](https://github.com/gramian) in [ArcadeData/arcadedb#2905](https://redirect.github.com/ArcadeData/arcadedb/pull/2905) > * feat: added new `database.getSize()` api by [`@lvca`](https://github.com/lvca) in [ArcadeData/arcadedb#3045](https://redirect.github.com/ArcadeData/arcadedb/pull/3045) > * Add filtered vector search support to LSMVectorIndex by [`@Copilot`](https://github.com/Copilot) in [ArcadeData/arcadedb#3072](https://redirect.github.com/ArcadeData/arcadedb/pull/3072) > * add stars chart by [`@robfrank`](https://github.com/robfrank) in [ArcadeData/arcadedb#3084](https://redirect.github.com/ArcadeData/arcadedb/pull/3084) > > ### Performance Improvements > > * Lsm vector fix by [`@lvca`](https://github.com/lvca) in [ArcadeData/arcadedb#2907](https://redirect.github.com/ArcadeData/arcadedb/pull/2907) > * perf: improved compression with lsm vectors by [`@lvca`](https://github.com/lvca) in [ArcadeData/arcadedb#2911](https://redirect.github.com/ArcadeData/arcadedb/pull/2911) ... (truncated) Commits * [`6290454`](ArcadeData/arcadedb@6290454) Set release version to 25.12.1 * [`5bdbdfa`](ArcadeData/arcadedb@5bdbdfa) chore: removed system.out * [`5764b95`](ArcadeData/arcadedb@5764b95) fix: deletion of light edge after last fix * [`a81163a`](ArcadeData/arcadedb@a81163a) fix: avoid reuse of deleted record in same tx * [`a42ae5e`](ArcadeData/arcadedb@a42ae5e) perf: avoid conversion of float[] into List<Float> in SQL engine * [`c8fb3e5`](ArcadeData/arcadedb@c8fb3e5) chore: refactoring conversion functions to float[] in a centralized method * [`de9bfcf`](ArcadeData/arcadedb@de9bfcf) fix: support List<Float> in vector index ([#3090](https://redirect.github.com/ArcadeData/arcadedb/issues/3090)) * [`9e964ef`](ArcadeData/arcadedb@9e964ef) Merge branch 'main' of <https://github.com/ArcadeData/arcadedb> * [`07c7d3e`](ArcadeData/arcadedb@07c7d3e) Fixed failing test using java * [`51a058b`](ArcadeData/arcadedb@51a058b) fix CONTAINSANY index use for lists of embedded documents ([#3051](https://redirect.github.com/ArcadeData/arcadedb/issues/3051)) * Additional commits viewable in [compare view](ArcadeData/arcadedb@25.11.1...25.12.1) [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility\_score?dependency-name=com.arcadedb:arcadedb-network&package-manager=maven&previous-version=25.11.1&new-version=25.12.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

* fix: lsm vector compactor - draft * fix: LSMVectorIndexCompacted page tracking - Added updatePageCount() call in createNewPage() to properly track page count - Follows same pattern as LSMTreeIndexCompacted - Prevents all pages from overwriting page 0 during compaction - Pages are now correctly numbered and getTotalPages() returns accurate count Note: Test still fails due to vector count mismatch after compaction. Investigation shows only 19,596 of 30,000 expected vectors are being loaded after compaction. Root cause appears to be related to how document updates trigger vector index changes and page flushing timing. * fix: Include page 0 in compaction and fix header size calculation - Changed mergePages to start from page 0 instead of page 1 - Page 0 contains vector data, not just metadata - Updated findLastImmutablePage to scan down to page 0 - Fixed loadVectorsFromFile to use correct header size for compacted page 0 - Page 0 of compacted index has 25-byte header (base 9 + metadata 16) - Other pages have 9-byte header - This was causing ~10,400 vectors to be missing after compaction All LSMVectorIndexTest tests now pass (14/14). * Update engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndex.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndexCompactor.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> (cherry picked from commit 17adcfd)

lvca added 3 commits December 11, 2025 02:07

fix: lsm vector compactor - draft

4813469

lvca added this to the 25.12.1 milestone Dec 11, 2025

lvca requested a review from Copilot December 11, 2025 07:40

lvca self-assigned this Dec 11, 2025

lvca added the bug label Dec 11, 2025

Copilot started reviewing on behalf of lvca December 11, 2025 07:41 View session

gemini-code-assist bot reviewed Dec 11, 2025

View reviewed changes

lvca and others added 2 commits December 11, 2025 02:45

Update engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndex.…

80eb23f

…java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndexC…

52b847f

…ompactor.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

lvca merged commit 17adcfd into main Dec 11, 2025
5 of 7 checks passed

lvca deleted the lsm-vector-fix branch December 11, 2025 07:46

Copilot AI reviewed Dec 11, 2025

View reviewed changes

ExtReMLapin mentioned this pull request Dec 11, 2025

Cannot create a node with cypher when there is an LSM vector index existing for it #2908

Closed

	System.out.println("Wrote " + entriesWritten + " non-deleted entries to compacted index (skipped " + deletedSkipped + " deleted entries)");
	LogManager.instance().log(Level.INFO, this, "Wrote %d non-deleted entries to compacted index (skipped %d deleted entries)", entriesWritten, deletedSkipped);

	System.out.println("loadVectorsFromFile: fileId=" + fileId + ", totalPages=" + totalPages + ", isCompacted=" + isCompacted);
	LogManager.instance().log(this, Level.FINE, "loadVectorsFromFile: fileId=%d, totalPages=%d, isCompacted=%b", null, fileId, totalPages, isCompacted);

-  private static final String  FILE_NAME      = "/Users/luca/Downloads/glove.twitter.27B.100d.txt";
+  private static final String  FILE_NAME      =
+      System.getProperty("glove.file.path",
+          System.getenv("GLOVE_FILE_PATH") != null
+              ? System.getenv("GLOVE_FILE_PATH")
+              : "src/test/resources/glove.twitter.27B.100d.txt");

Uh oh!

Conversation

lvca commented Dec 11, 2025

What does this PR do?

Motivation

Related issues

Additional Notes

Checklist

Uh oh!

gemini-code-assist bot commented Dec 11, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

mergify bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

🟢 All jobs passed!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

codacy-production bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mergify bot commented Dec 11, 2025 •

edited

Loading

codacy-production bot commented Dec 11, 2025 •

edited

Loading