Skip to content

#2702 CREATE INDEX waits for any running async tasks#3408

Merged
robfrank merged 2 commits intomainfrom
call-devs
Feb 13, 2026
Merged

#2702 CREATE INDEX waits for any running async tasks#3408
robfrank merged 2 commits intomainfrom
call-devs

Conversation

@robfrank
Copy link
Collaborator

@robfrank robfrank commented Feb 11, 2026

Summary

  • Fix sequential index creation failing with NeedRetryException when background LSMTree
    compaction is running (NeedRetryException when creating indexes sequentially on large datasets #2702)
  • Instead of throwing immediately, TypeIndexBuilder and ManualIndexBuilder now wait for async
    tasks to complete before creating a new index
  • Clean up searchMoreLikeThis in LSMTreeFullTextIndex: simplify score accumulation (Map<RID,
    Integer> instead of AtomicInteger), remove redundant null checks, use
    Comparator.comparingInt().reversed() for sorting

Test plan

  • New Issue2702SequentialIndexCreationTest covers sequential index creation under compaction
    and verifies the waiting behavior
  • Run existing full-text index tests to verify searchMoreLikeThis refactoring: mvn test
    -Dtest="FullText,MoreLikeThis"
  • Run index builder regression tests: mvn test -Dtest="IndexBuilder,IndexCompaction"

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @robfrank, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where sequential index creation operations would fail with a NeedRetryException if background compaction was running. The fix implements a synchronous blocking mechanism, causing index creation to wait for ongoing asynchronous tasks to complete, thereby improving the reliability and predictability of index management without requiring manual retry logic from users.

Highlights

  • Sequential Index Creation Reliability: Addressed NeedRetryException that occurred when creating multiple indexes sequentially on large datasets, especially when background LSMTree compaction was active.
  • Synchronous Blocking for Index Creation: Modified TypeIndexBuilder.java and ManualIndexBuilder.java to wait for asynchronous tasks (like compaction) to complete before creating a new index, instead of throwing an exception.
  • Enhanced Test Coverage: Introduced Issue2702SequentialIndexCreationTest.java to comprehensively test sequential index creation and the new waiting behavior under compaction scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • 2702-sequential-index-creation-fix.md
  • engine/src/main/java/com/arcadedb/index/fulltext/LSMTreeFullTextIndex.java
    • Added Comparator import.
    • Added Javadoc comments to several methods (LSMTreeFullTextIndexFactoryHandler, get(Object[] keys, int limit), parseQueryTerms, put, remove, remove(Object[] keys, Identifiable rid), analyzeText).
    • Refactored searchMoreLikeThis method for cleaner code, removing redundant null/empty checks and simplifying score accumulation and sorting.
  • engine/src/main/java/com/arcadedb/schema/ManualIndexBuilder.java
    • Changed behavior to wait for async completion instead of throwing NeedRetryException.
    • Added explanatory comment.
  • engine/src/main/java/com/arcadedb/schema/TypeIndexBuilder.java
    • Changed behavior to wait for async completion instead of throwing NeedRetryException.
    • Added explanatory comment.
  • engine/src/test/java/com/arcadedb/index/Issue2702SequentialIndexCreationTest.java
    • Added new comprehensive test for sequential index creation and async compaction waiting.
Activity
  • No specific review comments or discussions have been recorded yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link

claude bot commented Feb 11, 2026

Pull Request Review

I've reviewed PR #3408 and have the following feedback:

Overview

This PR addresses Issue #2702 by changing the behavior when creating indexes while async processing (LSMTree compaction) is running. Instead of immediately throwing NeedRetryException, the code now waits for async tasks to complete.


🟢 Positives

1. Code Quality

  • ✅ Clean, focused changes with clear intent
  • ✅ Excellent inline comments explaining the rationale
  • ✅ Consistent implementation across both TypeIndexBuilder and ManualIndexBuilder
  • ✅ Follows existing code style (single-statement if blocks without braces)

2. Test Coverage

  • ✅ Comprehensive test class with two distinct test scenarios
  • ✅ Tests cover the actual issue (sequential index creation on large datasets)
  • ✅ Tests verify the waiting behavior explicitly
  • ✅ Good use of assertThat() syntax as per project guidelines
  • ✅ Proper cleanup of GlobalConfiguration changes in finally blocks

3. Documentation

  • ✅ Extensive markdown documentation (2702-sequential-index-creation-fix.md) explaining the problem, solution, and trade-offs
  • ✅ Clear comments in code explaining the behavior change

🟡 Concerns & Suggestions

1. Thread Safety & Deadlock Risk ⚠️

Issue: Calling database.async().waitCompletion() while holding transaction locks could potentially lead to deadlock scenarios.

Location:

  • TypeIndexBuilder.java:86-89
  • ManualIndexBuilder.java:47-50

Concern: If the async compaction task needs to acquire any locks that the current thread holds, this could deadlock. The code path should be analyzed to ensure:

  • The async tasks don't need schema locks
  • The current transaction context won't block async workers

Recommendation: Add a timeout to waitCompletion() calls to prevent indefinite blocking:

if (database.isAsyncProcessing()) {
  // Wait with timeout to prevent indefinite blocking
  database.async().waitCompletion(timeout);
}

2. LSMTreeFullTextIndex.java Changes

Issue: This file contains significant refactoring unrelated to the main fix:

  • Removed AtomicInteger usage in searchMoreLikeThis (line 695: Map<RID, Integer> instead of Map<RID, AtomicInteger>)
  • Simplified control flow (removed many braces)
  • Added JavaDoc comments
  • Changed sorting implementation (line 715: using Comparator.comparingInt().reversed())

Concern: These changes, while improvements, are not part of the sequential index creation fix and make the PR harder to review. They could also introduce subtle bugs.

Recommendation:

  • Per CLAUDE.md: "don't add Claude as author" - The extensive JavaDoc additions suggest AI-generated documentation
  • 🔄 Consider splitting LSMTreeFullTextIndex changes into a separate PR for proper review
  • 🧪 Ensure the searchMoreLikeThis refactoring has adequate test coverage

3. Unrelated Documentation File

Issue: The file 2702-sequential-index-creation-fix.md in the root directory appears to be design documentation.

Question: Should this be:

  • Moved to a docs/ directory?
  • Converted to a GitHub issue comment instead?
  • Removed before merging (documentation belongs in code comments and commit messages)?

Per CLAUDE.md: "NEVER proactively create documentation files (*.md)" - This markdown file should likely not be in the repository.

4. Missing Validation

Question: What happens if database.async().waitCompletion() throws an exception? Should there be error handling?

if (database.isAsyncProcessing()) {
  try {
    database.async().waitCompletion();
  } catch (InterruptedException e) {
    Thread.currentThread().interrupt();
    throw new NeedRetryException("Interrupted while waiting for async tasks", e);
  }
}

5. Performance Considerations

Question: The documentation states this can take "30-60+ seconds" for compaction. This means:

  • Index creation DDL operations can block for minutes
  • Applications may experience timeouts if not configured properly
  • Users might perceive the system as hung

Recommendation: Consider adding:

  • Logging when waiting starts: LogManager.instance().log(..., FINE, "Waiting for async compaction...")
  • Progress indicators if possible
  • Documentation in user-facing docs about this behavior

🔍 Testing Recommendations

Run These Tests Before Merging:

# Test the new functionality
mvn test -Dtest=Issue2702SequentialIndexCreationTest

# Regression tests for index creation
mvn test -Dtest="*IndexBuilder*,*IndexCompaction*,LSMTreeIndexTest,TypeLSMTreeIndexTest"

# Full-text index tests (due to LSMTreeFullTextIndex changes)
mvn test -Dtest="*FullText*,*MoreLikeThis*"

🎯 Recommendations Summary

Must Address:

  1. ❌ Remove or relocate the `2702-sequential-index-creation-fix.md" file (violates CLAUDE.md guidelines)
  2. 🔄 Split LSMTreeFullTextIndex refactoring into a separate PR or justify its inclusion
  3. 🧪 Run full test suite, especially full-text search tests

Should Consider:

  1. Add timeout to waitCompletion() calls
  2. Add logging when blocking on async tasks
  3. Add exception handling for interrupted waits
  4. Verify no deadlock scenarios exist

Nice to Have:

  1. Document the blocking behavior in user-facing documentation
  2. Consider adding metrics/monitoring for wait times

✅ Conclusion

The core fix for Issue #2702 is sound and well-implemented. The approach is simple, predictable, and aligns with standard database behavior. However, the PR contains unrelated changes that should be addressed before merging.

Verdict: Approve with changes requested for:

  • Removing the markdown doc file
  • Clarifying/separating LSMTreeFullTextIndex changes
  • Adding timeout/logging for production robustness

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where creating multiple indexes sequentially could fail with a NeedRetryException. The fix, implemented in TypeIndexBuilder and ManualIndexBuilder, is to wait for any ongoing asynchronous tasks to complete instead of throwing an exception. This change improves the developer experience by removing the need for manual retry logic.

A new test class, Issue2702SequentialIndexCreationTest, has been added to verify the fix under conditions that would previously trigger the error, which is great.

Additionally, the LSMTreeFullTextIndex class has been refactored for better readability and maintainability, including the addition of Javadocs and the use of more modern Java constructs.

My review includes one suggestion regarding coding style in LSMTreeFullTextIndex.java to ensure long-term maintainability.

@mergify
Copy link
Contributor

mergify bot commented Feb 11, 2026

🧪 CI Insights

Here's what we observed from your CI run for 810fc8c.

🟢 All jobs passed!

But CI Insights is watching 👀

@codacy-production
Copy link

codacy-production bot commented Feb 11, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-11.02% 82.14%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (b2dcfda) 83715 59702 71.32%
Head commit (810fc8c) 113266 (+29551) 68297 (+8595) 60.30% (-11.02%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3408) 28 23 82.14%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 60.71429% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.96%. Comparing base (b2dcfda) to head (810fc8c).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
.../arcadedb/index/fulltext/LSMTreeFullTextIndex.java 70.83% 2 Missing and 5 partials ⚠️
...n/java/com/arcadedb/schema/ManualIndexBuilder.java 0.00% 2 Missing ⚠️
...ain/java/com/arcadedb/schema/TypeIndexBuilder.java 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3408      +/-   ##
==========================================
+ Coverage   62.52%   62.96%   +0.43%     
==========================================
  Files        1251     1251              
  Lines       83715    83708       -7     
  Branches    17142    17140       -2     
==========================================
+ Hits        52346    52704     +358     
+ Misses      24039    23603     -436     
- Partials     7330     7401      +71     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


// Wait for any running async tasks (e.g., compaction) to complete before creating new index
// This prevents NeedRetryException when creating multiple indexes sequentially on large datasets
if (database.isAsyncProcessing())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be

while (database.isAsyncProcessing())          
  database.async().waitCompletion();

In case of concurrency?

@@ -0,0 +1,163 @@
# Issue #2702: NeedRetryException when creating indexes sequentially on large datasets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this file in the codebase


// Wait for any running async tasks (e.g., compaction) to complete before creating new index
// This prevents NeedRetryException when creating multiple indexes sequentially on large datasets
if (database.isAsyncProcessing())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be

while (database.isAsyncProcessing())          
  database.async().waitCompletion();

In case of concurrency?

robfrank and others added 2 commits February 12, 2026 16:47
- Use while loop instead of if for async wait in TypeIndexBuilder and
  ManualIndexBuilder to handle concurrency edge cases
- Remove unnecessary 2702-sequential-index-creation-fix.md doc file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 12, 2026

Pull Request Review - PR #3408

Overview

This PR addresses Issue #2702 by fixing sequential index creation failures that occurred when background LSMTree compaction was running. The fix changes the behavior from throwing NeedRetryException to waiting for async tasks to complete.


Code Quality & Best Practices

Positive Changes

  1. Core Fix is Sound (ManualIndexBuilder.java, TypeIndexBuilder.java)

    • Changed from if statement to while loop as suggested in review feedback
    • This properly handles edge cases where isAsyncProcessing() might become true again
    • Good explanatory comments added
  2. Improved Code Readability (LSMTreeFullTextIndex.java)

    • Removed unnecessary braces following project style guidelines
    • Simplified score accumulation from AtomicInteger to Integer (line 750)
    • Cleaner comparator using method reference (line 769)
  3. Better Documentation

    • Added comprehensive JavaDoc comments to key methods
    • Good inline comments explaining the scoring logic
  4. Comprehensive Test Coverage (Issue2702SequentialIndexCreationTest.java)

    • Two well-designed test cases covering the regression
    • Tests actually verify the fix by forcing compaction and creating sequential indexes
    • Good use of configuration to trigger edge cases

Potential Issues & Concerns

1. Critical: Infinite Loop Risk

Files: ManualIndexBuilder.java:49-50, TypeIndexBuilder.java:88-89

Problem: This pattern could theoretically cause an infinite loop in rare scenarios if waitCompletion() returns but new async tasks start immediately, or if background tasks continuously queue new work.

Recommendation: Add a timeout or retry limit to prevent indefinite blocking.

2. Performance: Blocking Behavior

Index creation now blocks waiting for ALL async operations (not just compaction). This could impact systems with continuous background tasks, and there is no logging to indicate why index creation is delayed.

3. Test Quality Issues

  • Test does not assert that wasAsyncActive was true, so it might pass without exercising the fix
  • Timer usage instead of ExecutorService is not ideal for tests

Security Concerns

No security issues identified.


Recommendations Summary

High Priority

  1. Add timeout/retry limit to the while loop to prevent infinite loops
  2. Add logging to indicate when index creation is waiting
  3. Verify test actually exercises the fix by asserting wasAsyncActive

Medium Priority

  1. Consider more granular async task filtering (wait only for compaction)
  2. Replace Timer with ExecutorService in tests
  3. Add test coverage for timeout and failure scenarios

Conclusion

The core fix is functionally correct and addresses Issue #2702. The implementation follows review feedback well. However, the lack of timeout protection is a significant concern that should be addressed before merging to production.

Recommendation: Request changes to add timeout protection and improve test verification.

Overall code quality: Good with concerns

@robfrank robfrank changed the title fix tests #2702 CREATE INDEX waits for any running async tasks Feb 12, 2026
@robfrank robfrank requested a review from lvca February 12, 2026 16:57
@lvca lvca added this to the 26.2.1 milestone Feb 12, 2026
@robfrank robfrank merged commit 1059e4c into main Feb 13, 2026
23 of 28 checks passed
@robfrank robfrank deleted the call-devs branch February 13, 2026 08:55
robfrank added a commit that referenced this pull request Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NeedRetryException when creating indexes sequentially on large datasets

2 participants