#2702 CREATE INDEX waits for any running async tasks by robfrank · Pull Request #3408 · ArcadeData/arcadedb

robfrank · 2026-02-11T18:49:16Z

Summary

Fix sequential index creation failing with NeedRetryException when background LSMTree
compaction is running (NeedRetryException when creating indexes sequentially on large datasets #2702)
Instead of throwing immediately, TypeIndexBuilder and ManualIndexBuilder now wait for async
tasks to complete before creating a new index
Clean up searchMoreLikeThis in LSMTreeFullTextIndex: simplify score accumulation (Map<RID,
Integer> instead of AtomicInteger), remove redundant null checks, use
Comparator.comparingInt().reversed() for sorting

Test plan

New Issue2702SequentialIndexCreationTest covers sequential index creation under compaction
and verifies the waiting behavior
Run existing full-text index tests to verify searchMoreLikeThis refactoring: mvn test
-Dtest="FullText,MoreLikeThis"
Run index builder regression tests: mvn test -Dtest="IndexBuilder,IndexCompaction"

gemini-code-assist · 2026-02-11T18:49:36Z

Summary of Changes

Hello @robfrank, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where sequential index creation operations would fail with a NeedRetryException if background compaction was running. The fix implements a synchronous blocking mechanism, causing index creation to wait for ongoing asynchronous tasks to complete, thereby improving the reliability and predictability of index management without requiring manual retry logic from users.

Highlights

Sequential Index Creation Reliability: Addressed NeedRetryException that occurred when creating multiple indexes sequentially on large datasets, especially when background LSMTree compaction was active.
Synchronous Blocking for Index Creation: Modified TypeIndexBuilder.java and ManualIndexBuilder.java to wait for asynchronous tasks (like compaction) to complete before creating a new index, instead of throwing an exception.
Enhanced Test Coverage: Introduced Issue2702SequentialIndexCreationTest.java to comprehensively test sequential index creation and the new waiting behavior under compaction scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

2702-sequential-index-creation-fix.md
- Added detailed documentation for issue NeedRetryException when creating indexes sequentially on large datasets #2702, its solution, benefits, and testing.
engine/src/main/java/com/arcadedb/index/fulltext/LSMTreeFullTextIndex.java
- Added Comparator import.
- Added Javadoc comments to several methods (LSMTreeFullTextIndexFactoryHandler, get(Object[] keys, int limit), parseQueryTerms, put, remove, remove(Object[] keys, Identifiable rid), analyzeText).
- Refactored searchMoreLikeThis method for cleaner code, removing redundant null/empty checks and simplifying score accumulation and sorting.
engine/src/main/java/com/arcadedb/schema/ManualIndexBuilder.java
- Changed behavior to wait for async completion instead of throwing NeedRetryException.
- Added explanatory comment.
engine/src/main/java/com/arcadedb/schema/TypeIndexBuilder.java
- Changed behavior to wait for async completion instead of throwing NeedRetryException.
- Added explanatory comment.
engine/src/test/java/com/arcadedb/index/Issue2702SequentialIndexCreationTest.java
- Added new comprehensive test for sequential index creation and async compaction waiting.

Activity

No specific review comments or discussions have been recorded yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

claude · 2026-02-11T18:51:03Z

Pull Request Review

I've reviewed PR #3408 and have the following feedback:

Overview

This PR addresses Issue #2702 by changing the behavior when creating indexes while async processing (LSMTree compaction) is running. Instead of immediately throwing NeedRetryException, the code now waits for async tasks to complete.

🟢 Positives

1. Code Quality

✅ Clean, focused changes with clear intent
✅ Excellent inline comments explaining the rationale
✅ Consistent implementation across both TypeIndexBuilder and ManualIndexBuilder
✅ Follows existing code style (single-statement if blocks without braces)

2. Test Coverage

✅ Comprehensive test class with two distinct test scenarios
✅ Tests cover the actual issue (sequential index creation on large datasets)
✅ Tests verify the waiting behavior explicitly
✅ Good use of assertThat() syntax as per project guidelines
✅ Proper cleanup of GlobalConfiguration changes in finally blocks

3. Documentation

✅ Extensive markdown documentation (2702-sequential-index-creation-fix.md) explaining the problem, solution, and trade-offs
✅ Clear comments in code explaining the behavior change

🟡 Concerns & Suggestions

1. Thread Safety & Deadlock Risk ⚠️

Issue: Calling database.async().waitCompletion() while holding transaction locks could potentially lead to deadlock scenarios.

Location:

TypeIndexBuilder.java:86-89
ManualIndexBuilder.java:47-50

Concern: If the async compaction task needs to acquire any locks that the current thread holds, this could deadlock. The code path should be analyzed to ensure:

The async tasks don't need schema locks
The current transaction context won't block async workers

Recommendation: Add a timeout to waitCompletion() calls to prevent indefinite blocking:

if (database.isAsyncProcessing()) {
  // Wait with timeout to prevent indefinite blocking
  database.async().waitCompletion(timeout);
}

2. LSMTreeFullTextIndex.java Changes

Issue: This file contains significant refactoring unrelated to the main fix:

Removed AtomicInteger usage in searchMoreLikeThis (line 695: Map<RID, Integer> instead of Map<RID, AtomicInteger>)
Simplified control flow (removed many braces)
Added JavaDoc comments
Changed sorting implementation (line 715: using Comparator.comparingInt().reversed())

Concern: These changes, while improvements, are not part of the sequential index creation fix and make the PR harder to review. They could also introduce subtle bugs.

Recommendation:

❌ Per CLAUDE.md: "don't add Claude as author" - The extensive JavaDoc additions suggest AI-generated documentation
🔄 Consider splitting LSMTreeFullTextIndex changes into a separate PR for proper review
🧪 Ensure the searchMoreLikeThis refactoring has adequate test coverage

3. Unrelated Documentation File

Issue: The file 2702-sequential-index-creation-fix.md in the root directory appears to be design documentation.

Question: Should this be:

Moved to a docs/ directory?
Converted to a GitHub issue comment instead?
Removed before merging (documentation belongs in code comments and commit messages)?

Per CLAUDE.md: "NEVER proactively create documentation files (*.md)" - This markdown file should likely not be in the repository.

4. Missing Validation

Question: What happens if database.async().waitCompletion() throws an exception? Should there be error handling?

if (database.isAsyncProcessing()) {
  try {
    database.async().waitCompletion();
  } catch (InterruptedException e) {
    Thread.currentThread().interrupt();
    throw new NeedRetryException("Interrupted while waiting for async tasks", e);
  }
}

5. Performance Considerations

Question: The documentation states this can take "30-60+ seconds" for compaction. This means:

Index creation DDL operations can block for minutes
Applications may experience timeouts if not configured properly
Users might perceive the system as hung

Recommendation: Consider adding:

Logging when waiting starts: LogManager.instance().log(..., FINE, "Waiting for async compaction...")
Progress indicators if possible
Documentation in user-facing docs about this behavior

🔍 Testing Recommendations

Run These Tests Before Merging:

# Test the new functionality
mvn test -Dtest=Issue2702SequentialIndexCreationTest

# Regression tests for index creation
mvn test -Dtest="*IndexBuilder*,*IndexCompaction*,LSMTreeIndexTest,TypeLSMTreeIndexTest"

# Full-text index tests (due to LSMTreeFullTextIndex changes)
mvn test -Dtest="*FullText*,*MoreLikeThis*"

🎯 Recommendations Summary

Must Address:

❌ Remove or relocate the `2702-sequential-index-creation-fix.md" file (violates CLAUDE.md guidelines)
🔄 Split LSMTreeFullTextIndex refactoring into a separate PR or justify its inclusion
🧪 Run full test suite, especially full-text search tests

Should Consider:

Add timeout to waitCompletion() calls
Add logging when blocking on async tasks
Add exception handling for interrupted waits
Verify no deadlock scenarios exist

Nice to Have:

Document the blocking behavior in user-facing documentation
Consider adding metrics/monitoring for wait times

✅ Conclusion

The core fix for Issue #2702 is sound and well-implemented. The approach is simple, predictable, and aligns with standard database behavior. However, the PR contains unrelated changes that should be addressed before merging.

Verdict: Approve with changes requested for:

Removing the markdown doc file
Clarifying/separating LSMTreeFullTextIndex changes
Adding timeout/logging for production robustness

gemini-code-assist

Code Review

This pull request addresses an issue where creating multiple indexes sequentially could fail with a NeedRetryException. The fix, implemented in TypeIndexBuilder and ManualIndexBuilder, is to wait for any ongoing asynchronous tasks to complete instead of throwing an exception. This change improves the developer experience by removing the need for manual retry logic.

A new test class, Issue2702SequentialIndexCreationTest, has been added to verify the fix under conditions that would previously trigger the error, which is great.

Additionally, the LSMTreeFullTextIndex class has been refactored for better readability and maintainability, including the addition of Javadocs and the use of more modern Java constructs.

My review includes one suggestion regarding coding style in LSMTreeFullTextIndex.java to ensure long-term maintainability.

engine/src/main/java/com/arcadedb/index/fulltext/LSMTreeFullTextIndex.java

mergify · 2026-02-11T19:10:11Z

🧪 CI Insights

Here's what we observed from your CI run for 810fc8c.

🟢 All jobs passed!

But CI Insights is watching 👀

codacy-production · 2026-02-11T19:44:22Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ -11.02%	✅ 82.14%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`b2dcfda`)	83715	59702	71.32%
Head commit (`810fc8c`)	113266 (+29551)	68297 (+8595)	60.30% (-11.02%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#3408)	28	23	82.14%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

codecov · 2026-02-11T19:45:12Z

Codecov Report

❌ Patch coverage is 60.71429% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.96%. Comparing base (b2dcfda) to head (810fc8c).
⚠️ Report is 17 commits behind head on main.

Files with missing lines	Patch %	Lines
.../arcadedb/index/fulltext/LSMTreeFullTextIndex.java	70.83%	2 Missing and 5 partials ⚠️
...n/java/com/arcadedb/schema/ManualIndexBuilder.java	0.00%	2 Missing ⚠️
...ain/java/com/arcadedb/schema/TypeIndexBuilder.java	0.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3408      +/-   ##
==========================================
+ Coverage   62.52%   62.96%   +0.43%     
==========================================
  Files        1251     1251              
  Lines       83715    83708       -7     
  Branches    17142    17140       -2     
==========================================
+ Hits        52346    52704     +358     
+ Misses      24039    23603     -436     
- Partials     7330     7401      +71

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

lvca · 2026-02-11T19:52:37Z

engine/src/main/java/com/arcadedb/schema/TypeIndexBuilder.java


+    // Wait for any running async tasks (e.g., compaction) to complete before creating new index
+    // This prevents NeedRetryException when creating multiple indexes sequentially on large datasets
    if (database.isAsyncProcessing())


I think it should be

while (database.isAsyncProcessing()) database.async().waitCompletion();

In case of concurrency?

lvca · 2026-02-11T19:53:56Z

2702-sequential-index-creation-fix.md

@@ -0,0 +1,163 @@
+# Issue #2702: NeedRetryException when creating indexes sequentially on large datasets


We don't need this file in the codebase

lvca · 2026-02-11T19:54:17Z

engine/src/main/java/com/arcadedb/schema/ManualIndexBuilder.java


+    // Wait for any running async tasks (e.g., compaction) to complete before creating new index
+    // This prevents NeedRetryException when creating multiple indexes sequentially on large datasets
    if (database.isAsyncProcessing())


I think it should be

while (database.isAsyncProcessing()) database.async().waitCompletion();

In case of concurrency?

- Use while loop instead of if for async wait in TypeIndexBuilder and ManualIndexBuilder to handle concurrency edge cases - Remove unnecessary 2702-sequential-index-creation-fix.md doc file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-02-12T16:12:38Z

Pull Request Review - PR #3408

Overview

This PR addresses Issue #2702 by fixing sequential index creation failures that occurred when background LSMTree compaction was running. The fix changes the behavior from throwing NeedRetryException to waiting for async tasks to complete.

Code Quality & Best Practices

Positive Changes

Core Fix is Sound (ManualIndexBuilder.java, TypeIndexBuilder.java)
- Changed from if statement to while loop as suggested in review feedback
- This properly handles edge cases where isAsyncProcessing() might become true again
- Good explanatory comments added
Improved Code Readability (LSMTreeFullTextIndex.java)
- Removed unnecessary braces following project style guidelines
- Simplified score accumulation from AtomicInteger to Integer (line 750)
- Cleaner comparator using method reference (line 769)
Better Documentation
- Added comprehensive JavaDoc comments to key methods
- Good inline comments explaining the scoring logic
Comprehensive Test Coverage (Issue2702SequentialIndexCreationTest.java)
- Two well-designed test cases covering the regression
- Tests actually verify the fix by forcing compaction and creating sequential indexes
- Good use of configuration to trigger edge cases

Potential Issues & Concerns

1. Critical: Infinite Loop Risk

Files: ManualIndexBuilder.java:49-50, TypeIndexBuilder.java:88-89

Problem: This pattern could theoretically cause an infinite loop in rare scenarios if waitCompletion() returns but new async tasks start immediately, or if background tasks continuously queue new work.

Recommendation: Add a timeout or retry limit to prevent indefinite blocking.

2. Performance: Blocking Behavior

Index creation now blocks waiting for ALL async operations (not just compaction). This could impact systems with continuous background tasks, and there is no logging to indicate why index creation is delayed.

3. Test Quality Issues

Test does not assert that wasAsyncActive was true, so it might pass without exercising the fix
Timer usage instead of ExecutorService is not ideal for tests

Security Concerns

No security issues identified.

Recommendations Summary

High Priority

Add timeout/retry limit to the while loop to prevent infinite loops
Add logging to indicate when index creation is waiting
Verify test actually exercises the fix by asserting wasAsyncActive

Medium Priority

Consider more granular async task filtering (wait only for compaction)
Replace Timer with ExecutorService in tests
Add test coverage for timeout and failure scenarios

Conclusion

The core fix is functionally correct and addresses Issue #2702. The implementation follows review feedback well. However, the lack of timeout protection is a significant concern that should be addressed before merging to production.

Recommendation: Request changes to add timeout protection and improve test verification.

Overall code quality: Good with concerns

(cherry picked from commit 1059e4c)

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

engine/src/main/java/com/arcadedb/index/fulltext/LSMTreeFullTextIndex.java Show resolved Hide resolved

lvca requested changes Feb 11, 2026

View reviewed changes

robfrank and others added 2 commits February 12, 2026 16:47

fix tests

3d0c7c9

Address PR #3408 review feedback

810fc8c

- Use while loop instead of if for async wait in TypeIndexBuilder and ManualIndexBuilder to handle concurrency edge cases - Remove unnecessary 2702-sequential-index-creation-fix.md doc file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

robfrank force-pushed the call-devs branch from 2422c46 to 810fc8c Compare February 12, 2026 16:10

robfrank linked an issue Feb 12, 2026 that may be closed by this pull request

NeedRetryException when creating indexes sequentially on large datasets #2702

Closed

robfrank changed the title ~~fix tests~~ #2702 CREATE INDEX waits for any running async tasks Feb 12, 2026

robfrank requested a review from lvca February 12, 2026 16:57

lvca approved these changes Feb 12, 2026

View reviewed changes

lvca assigned robfrank Feb 12, 2026

lvca added this to the 26.2.1 milestone Feb 12, 2026

robfrank merged commit 1059e4c into main Feb 13, 2026
23 of 28 checks passed

robfrank deleted the call-devs branch February 13, 2026 08:55

robfrank added a commit that referenced this pull request Feb 17, 2026

#2702 CREATE INDEX waits for any running async tasks (#3408)

e8962cc

(cherry picked from commit 1059e4c)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#2702 CREATE INDEX waits for any running async tasks#3408

#2702 CREATE INDEX waits for any running async tasks#3408
robfrank merged 2 commits intomainfrom
call-devs

robfrank commented Feb 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Uh oh!

claude bot commented Feb 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

codacy-production bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

lvca Feb 11, 2026

Uh oh!

lvca Feb 11, 2026

Uh oh!

lvca Feb 11, 2026

Uh oh!

claude bot commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,163 @@
		# Issue #2702: NeedRetryException when creating indexes sequentially on large datasets

Uh oh!

Conversation

robfrank commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

claude bot commented Feb 11, 2026

Pull Request Review

Overview

🟢 Positives

1. Code Quality

2. Test Coverage

3. Documentation

🟡 Concerns & Suggestions

1. Thread Safety & Deadlock Risk ⚠️

2. LSMTreeFullTextIndex.java Changes

3. Unrelated Documentation File

4. Missing Validation

5. Performance Considerations

🔍 Testing Recommendations

Run These Tests Before Merging:

🎯 Recommendations Summary

✅ Conclusion

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

🟢 All jobs passed!

Uh oh!

codacy-production bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Uh oh!

codecov bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lvca Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

lvca Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

lvca Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Feb 12, 2026

Pull Request Review - PR #3408

Overview

Code Quality & Best Practices

Positive Changes

Potential Issues & Concerns

1. Critical: Infinite Loop Risk

2. Performance: Blocking Behavior

3. Test Quality Issues

Security Concerns

Recommendations Summary

High Priority

Medium Priority

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

robfrank commented Feb 11, 2026 •

edited

Loading

mergify bot commented Feb 11, 2026 •

edited

Loading

codacy-production bot commented Feb 11, 2026 •

edited

Loading

codecov bot commented Feb 11, 2026 •

edited

Loading