Fix compact large notunique index by lvca · Pull Request #2663 · ArcadeData/arcadedb

lvca · 2025-10-19T02:44:09Z

I believe I found and fixed the issue. I wrote a new test case that now passes. The problem was with not unique indexes: with many records having the same key, the index entry could be bigger than the index page, and there wasn't a way to split the index entry on multiple pages. Now there is :-)

FIxed issue #2434

gemini-code-assist · 2025-10-19T02:44:32Z

Summary of Changes

Hello @lvca, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical limitation in non-unique LSM-tree indexes where a large number of records sharing the same key could lead to index entries exceeding the capacity of a single page. The core solution involves enabling these large entries to be split and stored across multiple index pages, both during standard data insertion and index compaction. This significantly enhances the robustness and scalability of non-unique indexes, preventing potential data integrity issues. The changes are validated by a new, comprehensive test case designed to simulate and verify this specific scenario.

Highlights

Non-Unique Index Entry Splitting: Implemented a mechanism to split large non-unique index entries across multiple pages. Previously, if many records shared the same key, the index entry could exceed a single page's capacity, leading to issues. This change ensures such entries can span multiple pages.
Refactored Index Entry Writing: Introduced writeEntrySingleValue and writeEntryMultipleValues methods in LSMTreeIndexAbstract to explicitly handle writing single or multiple values, with the latter managing page overflows for non-unique keys.
Enhanced Key Lookup for Non-Unique Indexes: Added findFirstEntryOfSameKey and findLastEntryOfSameKey methods to efficiently retrieve all associated RIDs for a given key, especially when they are spread across multiple entries or pages.
Compaction Process Updates: The compaction logic in LSMTreeIndexCompacted and LSMTreeIndexCompactor has been updated to correctly handle and persist non-unique index entries that are split across multiple pages.
New Test Case for Large Non-Unique Entries: A new test case (checkNotUniqueEntries) was added to LSMTreeIndexCompactionTest to specifically validate the fix for non-unique indexes with a high number of records sharing the same key, ensuring the splitting mechanism works as expected during various compaction stages.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

lvca · 2025-10-19T02:44:42Z

Issue #2434

gemini-code-assist

Code Review

This pull request introduces a significant change to handle large non-unique indexes by allowing index entries to be split across multiple pages. The changes are extensive, touching mutable and compacted indexes, as well as the compaction process. While the core logic seems to address the described problem, I've found a few critical issues related to potential data loss and a NullPointerException in an error handling path. Additionally, there are several opportunities to improve maintainability by refactoring duplicated code blocks. The new test case is a great addition to validate the fix.

gemini-code-assist · 2025-10-19T02:46:14Z

engine/src/main/java/com/arcadedb/index/lsm/LSMTreeIndex.java

      final Integer lockedFileId = lockedNewFileId.get();
-      if (lockedFileId != null)
-        database.getTransactionManager().unlockFile(lockedFileId, Thread.currentThread());
+      database.getTransactionManager().unlockFile(lockedFileId, Thread.currentThread());


Removing the null check for lockedFileId could lead to a NullPointerException if database.getTransactionManager().unlockFile() is called with a null argument. The lockedNewFileId atomic reference is initialized to null and only set within the try block. If an exception occurs before it's set, lockedFileId will be null, and this finally block will throw an NPE, masking the original exception. The null check should be restored to ensure robust error handling.

Suggested change

database.getTransactionManager().unlockFile(lockedFileId, Thread.currentThread());

if (lockedFileId != null)

database.getTransactionManager().unlockFile(lockedFileId, Thread.currentThread());

engine/src/main/java/com/arcadedb/index/lsm/LSMTreeIndexCompactor.java

engine/src/main/java/com/arcadedb/index/lsm/LSMTreeIndexAbstract.java

engine/src/main/java/com/arcadedb/index/lsm/LSMTreeIndexCompacted.java

engine/src/main/java/com/arcadedb/index/lsm/LSMTreeIndexMutable.java

codacy-production · 2025-10-19T03:10:11Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.00%	✅ 93.91%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`bd2a3db`)	73033	46299	63.39%
Head commit (`2c20108`)	73105 (+72)	46346 (+47)	63.40% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#2663)	197	185	93.91%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

* fix: managed not unique indexes when key+values exceed page size * fix: compaction of not-unique indexes FIxed issue #2434 (cherry picked from commit 87c3227)

lvca added 3 commits October 17, 2025 00:31

fix: managed not unique indexes when key+values exceed page size

466d00c

fix: compaction of not-unique indexes

4a76e19

FIxed issue #2434

Merge branch 'main' into fix-compact-large-notunique-index

2c20108

lvca self-assigned this Oct 19, 2025

lvca added bug labels Oct 19, 2025

lvca added this to the 25.10.1 milestone Oct 19, 2025

lvca merged commit 87c3227 into main Oct 19, 2025
7 of 9 checks passed

gemini-code-assist bot reviewed Oct 19, 2025

View reviewed changes

lvca deleted the fix-compact-large-notunique-index branch October 19, 2025 03:26

robfrank pushed a commit that referenced this pull request Nov 10, 2025

Fix compact large notunique index (#2663)

be45c3e

* fix: managed not unique indexes when key+values exceed page size * fix: compaction of not-unique indexes FIxed issue #2434 (cherry picked from commit 87c3227)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix compact large notunique index#2663

Fix compact large notunique index#2663
lvca merged 3 commits intomainfrom
fix-compact-large-notunique-index

lvca commented Oct 19, 2025

Uh oh!

gemini-code-assist bot commented Oct 19, 2025

Uh oh!

lvca commented Oct 19, 2025

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codacy-production bot commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	database.getTransactionManager().unlockFile(lockedFileId, Thread.currentThread());
	if (lockedFileId != null)
	database.getTransactionManager().unlockFile(lockedFileId, Thread.currentThread());

Uh oh!

Conversation

lvca commented Oct 19, 2025

Uh oh!

gemini-code-assist bot commented Oct 19, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

lvca commented Oct 19, 2025

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codacy-production bot commented Oct 19, 2025

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant