Skip to content

Fix compact large notunique index#2663

Merged
lvca merged 3 commits intomainfrom
fix-compact-large-notunique-index
Oct 19, 2025
Merged

Fix compact large notunique index#2663
lvca merged 3 commits intomainfrom
fix-compact-large-notunique-index

Conversation

@lvca
Copy link
Contributor

@lvca lvca commented Oct 19, 2025

I believe I found and fixed the issue. I wrote a new test case that now passes. The problem was with not unique indexes: with many records having the same key, the index entry could be bigger than the index page, and there wasn't a way to split the index entry on multiple pages. Now there is :-)

@lvca lvca self-assigned this Oct 19, 2025
@lvca lvca added bug labels Oct 19, 2025
@lvca lvca added this to the 25.10.1 milestone Oct 19, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @lvca, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical limitation in non-unique LSM-tree indexes where a large number of records sharing the same key could lead to index entries exceeding the capacity of a single page. The core solution involves enabling these large entries to be split and stored across multiple index pages, both during standard data insertion and index compaction. This significantly enhances the robustness and scalability of non-unique indexes, preventing potential data integrity issues. The changes are validated by a new, comprehensive test case designed to simulate and verify this specific scenario.

Highlights

  • Non-Unique Index Entry Splitting: Implemented a mechanism to split large non-unique index entries across multiple pages. Previously, if many records shared the same key, the index entry could exceed a single page's capacity, leading to issues. This change ensures such entries can span multiple pages.
  • Refactored Index Entry Writing: Introduced writeEntrySingleValue and writeEntryMultipleValues methods in LSMTreeIndexAbstract to explicitly handle writing single or multiple values, with the latter managing page overflows for non-unique keys.
  • Enhanced Key Lookup for Non-Unique Indexes: Added findFirstEntryOfSameKey and findLastEntryOfSameKey methods to efficiently retrieve all associated RIDs for a given key, especially when they are spread across multiple entries or pages.
  • Compaction Process Updates: The compaction logic in LSMTreeIndexCompacted and LSMTreeIndexCompactor has been updated to correctly handle and persist non-unique index entries that are split across multiple pages.
  • New Test Case for Large Non-Unique Entries: A new test case (checkNotUniqueEntries) was added to LSMTreeIndexCompactionTest to specifically validate the fix for non-unique indexes with a high number of records sharing the same key, ensuring the splitting mechanism works as expected during various compaction stages.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@lvca
Copy link
Contributor Author

lvca commented Oct 19, 2025

Issue #2434

@lvca lvca merged commit 87c3227 into main Oct 19, 2025
7 of 9 checks passed
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant change to handle large non-unique indexes by allowing index entries to be split across multiple pages. The changes are extensive, touching mutable and compacted indexes, as well as the compaction process. While the core logic seems to address the described problem, I've found a few critical issues related to potential data loss and a NullPointerException in an error handling path. Additionally, there are several opportunities to improve maintainability by refactoring duplicated code blocks. The new test case is a great addition to validate the fix.

final Integer lockedFileId = lockedNewFileId.get();
if (lockedFileId != null)
database.getTransactionManager().unlockFile(lockedFileId, Thread.currentThread());
database.getTransactionManager().unlockFile(lockedFileId, Thread.currentThread());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Removing the null check for lockedFileId could lead to a NullPointerException if database.getTransactionManager().unlockFile() is called with a null argument. The lockedNewFileId atomic reference is initialized to null and only set within the try block. If an exception occurs before it's set, lockedFileId will be null, and this finally block will throw an NPE, masking the original exception. The null check should be restored to ensure robust error handling.

Suggested change
database.getTransactionManager().unlockFile(lockedFileId, Thread.currentThread());
if (lockedFileId != null)
database.getTransactionManager().unlockFile(lockedFileId, Thread.currentThread());

@codacy-production
Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.00% 93.91%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (bd2a3db) 73033 46299 63.39%
Head commit (2c20108) 73105 (+72) 46346 (+47) 63.40% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#2663) 197 185 93.91%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@lvca lvca deleted the fix-compact-large-notunique-index branch October 19, 2025 03:26
robfrank pushed a commit that referenced this pull request Nov 10, 2025
* fix: managed not unique indexes when key+values exceed page size

* fix: compaction of not-unique indexes

FIxed issue #2434

(cherry picked from commit 87c3227)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant